synonym
(同义词)词元过滤器允许在分析过程中轻松处理同义词。 同义词使用配置文件配置。示例如下:
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
}
以上配置一个 synonym
(同义词)过滤器,其中包含一个路径 analysis/synonym.txt(相对于 config 的位置)。 然后使用过滤器配置 synonym
同义词分析器。 其他设置有:ignore_case(默认为 false),和 expand
(默认为 true)。
tokenizer 参数控制将用于标记同义词的分词器,并且默认为 whitespace
分词器。
支持两种同义词格式:Solr,WordNet。
以下是文件的示例格式:
# Blank lines and lines starting with pound are comments.
# Explicit mappings match any token sequence on the LHS of "=>"
# and replace with all alternatives on the RHS. These types of mappings
# ignore the expand parameter in the schema.
# Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
# Equivalent synonyms may be separated with commas and give
# no explicit mapping. In this case the mapping behavior will
# be taken from the expand parameter in the schema. This allows
# the same synonym file to be used in different synonym handling strategies.
# Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
lol, laughing out loud
# If expand==true, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod
# Multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
# is equivalent to
foo => foo bar, baz
您也可以在配置文件中直接给过滤器定义同义词(请注意使用 synonyms 而不是 synonyms_path ):
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"i-pod, i pod => ipod",
"universe, cosmos"
]
}
}
}
}
}
}
但是,建议使用 synonyms_path 在文件中定义大型同义词集,因为内联指定会不必要地增加群集大小。
基于 WordNet 格式的 同义词 可以如下使用格式声明:
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms" : [
"s(100000001,1,'abstain',v,1,0).",
"s(100000001,2,'refrain',v,1,0).",
"s(100000001,3,'desist',v,1,0)."
]
}
}
}
}
}
}
同时支持使用 synonyms_path
在文本中定义 WordNet synonyms。