和solr的N天N夜（二）--加入中文分词器

来源：互联网发布：黄狮精知乎编辑：程序博客网时间：2024/06/01 16:55

因为solr本身对中文的分词效果较差，所有需要集成第三方的中文分词器。针对Solr的分词器比较多，其中最常用的的两个是mmseg4j和ik-analyzer。在这里，我选用的是mmseg4j。

1：导入对应的jar包：

下载mmseg4j-solr-2.3.1-SNAPSHOT.jar、mmseg4j-core-1.10.1-SNAPSHOT.jar两个jar包之后，拷贝到solr工程的lib目录下。

2：配置schema.xml

<span style="font-size:14px;"> <span style="font-weight: normal;">   <!-- mmseg4j中文分词器配置，配置filedType类型名称-->    <fieldtype name="textComplex" class="solr.TextField" positionIncrementGap="100"><analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic"/></analyzer></fieldtype><fieldtype name="textMaxWord" class="solr.TextField" positionIncrementGap="100"><analyzer><tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" /></analyzer></fieldtype><fieldtype name="textSimple" class="solr.TextField" positionIncrementGap="100"><analyzer><tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="n:/custom/path/to/my_dic" /></analyzer></fieldtype></span></span>

mmseg4j提供了三种类型的分词器，具体类型查看其它文档即可。

在配置了fieldType之后，就可以在schema.xml中配置field中配置需要查询的中文字段。

     <!--iamge_info表字段-->   <field name="src" type="string" indexed="true" stored="true"/>   <!--<field name="key_info" type="string" indexed="true" stored="true"/>-->   <field name="key_info" type="textMaxWord" indexed="true" stored="true"/>   <field name="update_date" type="date" indexed="true" stored="true"/>

3：重启tomcat服务器，并测试

0 0