指纹分析器

ch/analysis-fingerprint-analyzer.html

@您的名字，

fingerprint 分析器实现了OpenRefine项目使用的指纹识别算法来协助聚类。

输入文本较低，规范化以删除扩展字符，排序，重复数据删除并连接到单个令牌。如果配置了一个停用词列表，停止单词也将被删除。

定义

它包括：

分词器

Standard Tokenizer

词语过滤器

输出实例

POST _analyze
{
  "analyzer": "fingerprint",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}

上述的句子将产生以下的词语：

[ and consistent godel is said sentence this yes ]

配置

fingerprint（指纹）分析器接受以下的参数：

有关停止字配置的更多信息，请参阅 Stop Token Filter。

配置实例

在这个例子中，我们配置 fingerprint 分析器以使用预定义的英文停止词列表：

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_fingerprint_analyzer": {
          "type": "fingerprint",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_fingerprint_analyzer",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}

以上示例产生以下词语：

[ consistent godel said sentence yes ]