Elasticsearch自定义分词插件
来源:互联网 发布:数据库运维主要做什么 编辑:程序博客网 时间:2024/05/23 19:49
参考书籍:《深入理解elasticsearch》
例子源码地址:https://github.com/felayman/vc-segment
创建自定义插件
开发一个elasticsearch插件需要的流程如下:
- 实现TokenFilter类(org.apache.lucene.analysis包) 用于修改和扩展token的内容
- AbstractTokenFilterFactory(org.elasticsearch.index.analysis包) 用于生成TokenFilter的实际对象,工厂模式
- AnalyzerProvider(org.elasticsearch.index.analysis包) 用于提供Analyzer示例
- AnalysisModule(org.elasticsearch.index.analysis包) 利用guice注入分析插件的名称
- AbstractComponent(org.elasticsearch.common.component包) 核心组件,用于利用工厂来创建自定义的分析器(Analyzer)和过滤器(TokenFilter)
- AbstractModule(org.elasticsearch.common.inject包) 注入模块,告诉AbstractComponent生成怎样的实例
- AbstractPlugin(org.elasticsearch.plugins包) 告诉elasticsearch插件的
需要在pom.xml中添加如下配置:
<dependencies> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>${elasticsearch.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>${junit.version}</version> </dependency> </dependencies>
代码如下:
package com.vcg.community.analysis;import org.apache.lucene.analysis.TokenFilter;import org.apache.lucene.analysis.TokenStream;import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;import java.io.IOException;/**过滤器 * @author felayman@gmail.com * @since 2017/3/15 */public final class CustomFilter extends TokenFilter { //允许我们检索目前正在处理的token的文本内容 private final CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class); /** * Construct a token stream filtering the given input. * @param input */ protected CustomFilter(TokenStream input) { super(input); } @Override public boolean incrementToken() throws IOException { if(input.incrementToken()){ char [] originalTerm = charTermAttribute.buffer(); if (originalTerm.length > 0){ StringBuilder stringBuilder = new StringBuilder(new String(originalTerm).trim()).reverse(); charTermAttribute.setEmpty(); charTermAttribute.append(stringBuilder.toString()); } return true; } return false; }}
package com.vcg.community.analysis;import org.apache.lucene.analysis.TokenStream;import org.elasticsearch.common.inject.Inject;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.index.Index;import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;import org.elasticsearch.index.settings.IndexSettings;/**返回词项过滤器的实例 * @author felayman@gmail.com * @since 2017/3/15 */public class CustomFilteFactory extends AbstractTokenFilterFactory { @Inject public CustomFilteFactory(Index index, @IndexSettings Settings indexSettings, String name, Settings settings) { super(index, indexSettings, name, settings); } @Override public TokenStream create(TokenStream tokenStream) { return new CustomFilter(tokenStream); }}
package com.vcg.community.analysis;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.Tokenizer;import org.apache.lucene.analysis.core.WhitespaceTokenizer;import org.apache.lucene.util.Version;import java.io.Reader;/** * 自定义的分析器 * @author felayman@gmail.com * @since 2017/3/15 */public final class CustomAnalyzer extends Analyzer { private final Version version ; public CustomAnalyzer(Version version) { this.version = version; } @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { final Tokenizer tokenizer = new WhitespaceTokenizer(this.version,reader); return new TokenStreamComponents(tokenizer,new CustomFilter(tokenizer)); }}
package com.vcg.community.analysis;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.index.Index;import org.elasticsearch.index.analysis.AbstractIndexAnalyzerProvider;import org.elasticsearch.index.settings.IndexSettings;/** * @author felayman@gmail.com * @since 2017/3/15 */public class CustomanalyzerProvider extends AbstractIndexAnalyzerProvider<CustomAnalyzer> { private final CustomAnalyzer customAnalyzer; public CustomanalyzerProvider(Index index, @IndexSettings Settings indexSettings, String name, Settings settings, CustomAnalyzer customAnalyzer) { super(index, indexSettings, name, settings); this.customAnalyzer = customAnalyzer; } public CustomanalyzerProvider(Index index, @IndexSettings Settings indexSettings, String prefixSettings, String name, Settings settings, CustomAnalyzer customAnalyzer) { super(index, indexSettings, prefixSettings, name, settings); this.customAnalyzer = customAnalyzer; } @Override public CustomAnalyzer get() { return this.customAnalyzer; }}
package com.vcg.community.analysis;import org.elasticsearch.index.analysis.AnalysisModule;/** * @author felayman@gmail.com * @since 2017/3/15 */public class CustomAnalysisBinderProcessor extends AnalysisModule.AnalysisBinderProcessor { @Override public void processCharFilters(CharFiltersBindings charFiltersBindings) { super.processCharFilters(charFiltersBindings); } @Override public void processTokenFilters(TokenFiltersBindings tokenFiltersBindings) { tokenFiltersBindings.processTokenFilter("mastering_analyzer",CustomFilteFactory.class); } @Override public void processTokenizers(TokenizersBindings tokenizersBindings) { super.processTokenizers(tokenizersBindings); } @Override public void processAnalyzers(AnalyzersBindings analyzersBindings) { analyzersBindings.processAnalyzer("mastering_analyzer",CustomanalyzerProvider.class); }}
package com.vcg.community.analysis;import org.apache.lucene.analysis.TokenStream;import org.apache.lucene.util.Version;import org.elasticsearch.common.component.AbstractComponent;import org.elasticsearch.common.inject.Inject;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.index.analysis.AnalyzerScope;import org.elasticsearch.index.analysis.PreBuiltAnalyzerProviderFactory;import org.elasticsearch.index.analysis.PreBuiltTokenFilterFactoryFactory;import org.elasticsearch.index.analysis.TokenFilterFactory;import org.elasticsearch.indices.analysis.IndicesAnalysisService;/** * @authorfelayman@gmail.com * @since 2017/3/15 */public class CustomAnalyzerIndicesComponent extends AbstractComponent { @Inject public CustomAnalyzerIndicesComponent(Settings settings, IndicesAnalysisService indicesAnalysisService) { super(settings); indicesAnalysisService.analyzerProviderFactories().put( "mastering_analyzer", new PreBuiltAnalyzerProviderFactory("mastering_analyzer",AnalyzerScope.INDICES,new CustomAnalyzer(Version.LUCENE_4_10_0))); indicesAnalysisService.tokenFilterFactories().put("mastering_filter",new PreBuiltTokenFilterFactoryFactory( new TokenFilterFactory() { @Override public String name() { return "mastering_filter"; } @Override public TokenStream create(TokenStream tokenStream) { return new CustomFilter(tokenStream); } } )); }}
package com.vcg.community.analysis;import org.elasticsearch.common.inject.AbstractModule;/** * @author felayman@gmail.com * @since 2017/3/15 */public class CustomAnalyzerModule extends AbstractModule { @Override protected void configure() { bind(CustomAnalyzerIndicesComponent.class).asEagerSingleton(); }}
package com.vcg.community.analysis;import org.elasticsearch.common.collect.ImmutableList;import org.elasticsearch.common.inject.Module;import org.elasticsearch.index.analysis.AnalysisModule;import org.elasticsearch.plugins.AbstractPlugin;import java.util.Collection;/** * @author felayman@gmail.com * @since 2017/3/15 */public class CustomAnalyzerPlugin extends AbstractPlugin { @Override public String name() { return "AnalyzerPlugin"; } @Override public String description() { return "Custom analyzer plugin"; } @Override public Collection<Class<? extends Module>> modules() { return ImmutableList.<Class<? extends Module>>of(CustomAnalyzerModule.class); } public void onModule(AnalysisModule module){ module.addProcessor(new CustomAnalysisBinderProcessor()); }}
- com.vcg.community.analysis.CustomFilter 词项过滤器的具体实现,负责对每个词项(token或term)进行修改,其中token是包含term偏移量和类型的数据结构
- com.vcg.community.analysis.CustomFilteFactory 创建实际词项过滤器的工厂,同时也利用guice注入elasticsearch的配置
- com.vcg.community.analysis.CustomAnalyzer 核心分析器,对词项进行复杂的处理
- com.vcg.community.analysis.CustomanalyzerProvider 自定义分析器的提供,告诉elasticsearch我们用的插件用的是哪个分析器
- com.vcg.community.analysis.CustomAnalysisBinderProcessor 用于告知elasticsearch我们自定义插件的分析器和词项过滤器的可用名称,同时注册词项过滤器
- com.vcg.community.analysis.CustomAnalyzerIndicesComponent 节点级组件,允许我们复用分析器和词项过滤器
- com.vcg.community.analysis.CustomAnalyzerModule 告诉elasticsearch我们的CustomAnalyzerIndicesComponent的使用方式,这里是单例来使用
- com.vcg.community.analysis.CustomAnalyzerPlugin 告诉向elasticsearch注入我们的自定义分析插件
1 0
- Elasticsearch自定义分词插件
- ElasticSearch自定义分析器-集成结巴分词插件
- ElasticSearch自定义分析器-集成结巴分词插件
- ElasticSearch自定义分析器-集成结巴分词插件
- ElasticSearch自定义分析器-集成结巴分词插件
- ElasticSearch 安装分词插件
- elasticsearch-word分词插件
- ElasticSearch安装ik分词插件
- ElasticSearch安装ik分词插件
- ElasticSearch 安装 ik 分词插件
- ElasticSearch安装ik分词插件
- ElasticSearch安装ik分词插件
- ElasticSearch 中文分词插件安装
- Elasticsearch自定义插件开发
- Elasticsearch自定义插件
- Elasticsearch安装中文分词插件ik
- Elasticsearch安装中文分词插件ik
- Elasticsearch安装中文分词插件ik
- C语言中的volatile用法
- spring 父子容器
- JS中几种实用的跨域方法原理详解
- 抢金块(gold) 基础dp
- python Socket编程
- Elasticsearch自定义分词插件
- POJ1195/BZOJ1176 题解,CDQ分治
- POJ 1050 To the Max(dp 最大子矩阵和/最大子段和问题)
- uoj#179 线性规划 单纯形法の模板
- [区块链]应用案例之黄金资产证券化Digix白皮书和DGX解读
- linux 操作系统 常用指令 备忘
- Zookeeper简介(一)
- Linux下Tomcat的安装部署
- IBM 存储RAID硬盘离线和数据库损坏怎么处理