solr5.5.4扩展ansj_lucene5
来源:互联网 发布:用python编写数据库 编辑:程序博客网 时间:2024/06/05 02:14
- solr5.5.4
http://mirror.bit.edu.cn/apache/lucene/solr/ ansj
https://github.com/NLPchina/ansj_seg
下载ansj源码,在ansj_lucene5_plug中添加org.ansj.solr.AnsjTokenizerFactory
package org.ansj.solr;import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.IOException;import java.io.InputStreamReader;import java.util.ArrayList;import java.util.List;import java.util.Map;import org.ansj.lucene.util.AnsjTokenizer;import org.ansj.recognition.impl.StopRecognition;import org.ansj.splitWord.analysis.IndexAnalysis;import org.ansj.splitWord.analysis.ToAnalysis;import org.apache.lucene.analysis.Tokenizer;import org.apache.lucene.analysis.util.TokenizerFactory;import org.apache.lucene.util.AttributeFactory;import org.slf4j.Logger;import org.slf4j.LoggerFactory;public class AnsjTokenizerFactory extends TokenizerFactory { public final Logger logger = LoggerFactory.getLogger(getClass()); boolean pstemming; boolean isQuery; private String stopwordsDir; public List<StopRecognition> filters; public AnsjTokenizerFactory(Map<String, String> args) { super(args); filters = new ArrayList<StopRecognition>(); getLuceneMatchVersion(); isQuery = getBoolean(args, "isQuery", true); pstemming = getBoolean(args, "pstemming", false); stopwordsDir = get(args, "stopwords"); addStopwords(stopwordsDir); } // add stopwords list to filter private void addStopwords(String dir) { if (dir == null) { logger.info("no stopwords dir"); return; } // read stoplist logger.info("stopwords: " + dir); File file = new File(dir); InputStreamReader reader; try { reader = new InputStreamReader(new FileInputStream(file), "UTF-8"); BufferedReader br = new BufferedReader(reader); StopRecognition testFilter = new StopRecognition(); String word = br.readLine(); while (word != null) { testFilter.insertStopWords(word); word = br.readLine(); } filters.add(testFilter); br.close(); } catch (FileNotFoundException e) { logger.info("No stopword file found"); } catch (IOException e) { logger.info("stopword file io exception"); } } @Override public Tokenizer create(AttributeFactory factory) { if (isQuery == true) { // query return new AnsjTokenizer(new ToAnalysis(), filters, null); } else { // index return new AnsjTokenizer(new IndexAnalysis(), filters, null); } }}
打包编译得到ansj_lucene5_plug-5.1.2.0.jar
将下边软件包移动到solr-5.5.4\server\solr-webapp\webapp\WEB-INF\lib下,
http://pan.baidu.com/s/1qY8Ycn6密码:xj74
分词配置文件(library.properties)放到/solr/server/resources目录下。
修改schema
<fieldType name="text_ansj" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="org.ansj.solr.AnsjTokenizerFactory" isQuery="false" stopwords="D:/solr-5.5.4/server/library/stopwords.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="org.ansj.solr.AnsjTokenizerFactory"/> </analyzer> </fieldType>
阅读全文
0 0
- solr5.5.4扩展ansj_lucene5
- 用于solr5的ansj分词插件扩展
- Solr5.5.4单机部署
- solr5.3 实现同义词 扩展词典 停止词典 功能介绍
- linux 环境搭建solr5.5.4搜索服务
- Solr5.5.4在linux centOS(6.8)下的安装
- solr5.2.1
- Solr5 HelloWord
- 下载Solr5
- solr5 文档
- Solr5.0快速入门
- Solr5 POST TOOL
- solr5.0注意事项
- Solr5.0说明文档
- Solr5 快速开始
- solr5.1.0 部署配置
- Solr5.1.0的搭建
- Solr5.1.0基础配置
- SpringBoot validator国际化随笔
- Windows环境下部署TestLink注意事项
- 腾讯云一元服务器搭建个人网站详细教程
- Materialise.Magics.V21.1最新软件,好用
- [bzoj1415][Noi2005]聪聪和可可 期望DP+记忆化搜索 & bzoj100题
- solr5.5.4扩展ansj_lucene5
- 从浏览器打开一个本地应用的回退栈问题
- 数学——洛谷 P1368 均分纸牌(加强版)
- linux网络有线连接失败“设备未托管” 解决办法
- sublime安装及插件安装
- 关于SQL查询语句的模糊查询,排序用法,limit用法的介绍
- jquery 自动触发<a> 标签的click()方法
- spring boot 读取配置文件
- C#中导出excel文档