索引模块-同义词词元过滤器(Index Modules-Synonym Token Filter)
来源:互联网 发布:美国经济评论数据公开 编辑:程序博客网 时间:2024/06/06 10:00
同义词词元过滤器
这个 同义词
词元过滤器能够轻松地处理同义词在分析过程。 同义词使用配置文件进行配置。 这里是一个例子:
{ "index" : { "analysis" : { "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } }}
上面的配置 synonym
过滤器,一个路径 analysis/synonym.txt
(相对于 config
位置)。 这个 synonym
分析器是然后配置过滤器。 额外的设置: ignore_case
(默认为false
),和 expand
(默认为 true
)。
这个 分词器
参数控制分词器,用于将分析同义词,默认了 whitespace
分词器。
截至elasticsearch 0.17.9 两个同义词格式支持:Solr,WordNet。
Solr同义词
下面是一个示例格式的文件:
# blank lines and lines starting with pound are comments.#Explicit mappings match any token sequence on the LHS of "=>"#and replace with all alternatives on the RHS. These types of mappings#ignore the expand parameter in the schema.#Examples:i-pod, i pod => ipod,sea biscuit, sea biscit => seabiscuit#Equivalent synonyms may be separated with commas and give#no explicit mapping. In this case the mapping behavior will#be taken from the expand parameter in the schema. This allows#the same synonym file to be used in different synonym handling strategies.#Examples:ipod, i-pod, i podfoozball , foosballuniverse , cosmos# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:ipod, i-pod, i pod => ipod, i-pod, i pod# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:ipod, i-pod, i pod => ipod#multiple synonym mapping entries are merged.foo => foo barfoo => baz#is equivalent tofoo => foo bar, baz
您还可以定义同义词过滤器直接在配置文件(注意使用 同义词
而不是 synonyms_path(同义词配置文件路径)
):
{ "filter" : { "synonym" : { "type" : "synonym", "synonyms" : [ "i-pod, i pod => ipod", "universe, cosmos" ] } }}
然而,建议定义大量的同义词集合在一个文件中使用synonyms_path
。
WordNet同义词
同义词基于 WordNet 格式可以被声明使用 格式
:
{ "filter" : { "synonym" : { "type" : "synonym", "format" : "wordnet", "synonyms" : [ "s(100000001,1,'abstain',v,1,0).", "s(100000001,2,'refrain',v,1,0).", "s(100000001,3,'desist',v,1,0)." ] } }}
使用 同义词路径(synonyms_path)
定义一组 WordNet同义词 在一个文件里也是支持的。
Synonym Token Filter
The synonym
token filter allows to easily handle synonyms during the analysis process. Synonyms are configured using a configuration file. Here is an example:
{ "index" : { "analysis" : { "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } }}
The above configures a synonym
filter, with a path of analysis/synonym.txt
(relative to the config
location). The synonym
analyzer is then configured with the filter. Additional settings are: ignore_case
(defaults to false
), and expand
(defaults to true
).
The tokenizer
parameter controls the tokenizers that will be used to tokenize the synonym, and defaults to the whitespace
tokenizer.
As of elasticsearch 0.17.9 two synonym formats are supported: Solr, WordNet.
Solr synonyms
The following is a sample format of the file:
# blank lines and lines starting with pound are comments.#Explicit mappings match any token sequence on the LHS of "=>"#and replace with all alternatives on the RHS. These types of mappings#ignore the expand parameter in the schema.#Examples:i-pod, i pod => ipod,sea biscuit, sea biscit => seabiscuit#Equivalent synonyms may be separated with commas and give#no explicit mapping. In this case the mapping behavior will#be taken from the expand parameter in the schema. This allows#the same synonym file to be used in different synonym handling strategies.#Examples:ipod, i-pod, i podfoozball , foosballuniverse , cosmos# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:ipod, i-pod, i pod => ipod, i-pod, i pod# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:ipod, i-pod, i pod => ipod#multiple synonym mapping entries are merged.foo => foo barfoo => baz#is equivalent tofoo => foo bar, baz
You can also define synonyms for the filter directly in the configuration file (note use ofsynonyms
instead of synonyms_path
):
{ "filter" : { "synonym" : { "type" : "synonym", "synonyms" : [ "i-pod, i pod => ipod", "universe, cosmos" ] } }}
However, it is recommended to define large synonyms set in a file using synonyms_path
.
WordNet synonyms
Synonyms based on WordNet format can be declared using format
:
{ "filter" : { "synonym" : { "type" : "synonym", "format" : "wordnet", "synonyms" : [ "s(100000001,1,'abstain',v,1,0).", "s(100000001,2,'refrain',v,1,0).", "s(100000001,3,'desist',v,1,0)." ] } }}
Using synonyms_path
to define WordNet synonyms in a file is supported as well.
- 索引模块-同义词词元过滤器(Index Modules-Synonym Token Filter)
- oracle同义词创建(synonym)
- oracle同义词创建(synonym)
- 同义词synonym
- 同义词(synonym)
- 同义词 synonym
- synonym一般指同义词(计算机术语)
- Oracle同义词(Synonym)创建删除
- 13.数据库对象----同义词(synonym)
- 同义词synonym创建授权
- 序列(SEQUENCE)、同义词(SYNONYM)
- oracle synonym---同义词
- Oracle 同义词synonym 学习
- oracle 同义词(synonym)
- Oracle 创建同义词 synonym
- Oracle 同义词(synonym) 简介
- oracle 同义词(synonym)
- ORACLE SYNONYM/同义词
- List和ArrayList的区别
- 机器学习中的数学(5)-强大的矩阵奇异值分解(SVD)及其应用
- Objective-C属性修饰符strong和copy的区别
- java的HashCode equals == 以及hashMap底层实现深入理解
- Kafka学习之:Centos 下Kafka集群的安装
- 索引模块-同义词词元过滤器(Index Modules-Synonym Token Filter)
- Android SurfaceView实战 打造抽奖转盘
- 腾讯全民wifi驱动 v1.1.923 官方版
- android监听 联系人
- python面向对象编程
- Java之Concurrent包中线程池
- vc利用api载入jpeg、gif图像
- .net学习第十天
- 爬楼梯问题