solr中同义词配置 (Synonyms)

来源:互联网 发布:js如何获取本机ip地址 编辑:程序博客网 时间:2024/04/29 01:21

 

1) 配置 

 

==========================schema.xml START=================================================

<fieldType name="textMaxWord" class="solr.TextField" >^M

      <analyzer type="index">
        <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word"/>^M
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

      <analyzer type="query">
        <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word"/>^M
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


<fieldType name="text_cn_solr" class="solr.TextField" >
  <analyzer type="index">
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<filter class="solr.SmartChineseWordTokenFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="flase" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PositionFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>

<analyzer type="query">
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<filter class="solr.SmartChineseWordTokenFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PositionFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
    </fieldType> 

 

……

<field name="title" type="textMaxWord" indexed="true" stored="true" termVectors="true"/>

 <field name="title_smart" type="text_cn_solr" indexed="true" stored="true" termVectors="true"/>

……

==========================schema.xml END=================================================

 

==========================conf/synonyms.txt START=================================================

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs 
中国,美国,德国,法国
==========================conf/synonyms.txt END=================================================

2)使用

搜索title:中国  结果包括美国、中国的结果都会出现