分词资料

来源：互联网发布：js 格式化数字前补零编辑：程序博客网时间：2024/04/26 01:01

unexpected analyzer’s result
https://github.com/elastic/elasticsearch/issues/27326

MMSeg

算法论文：
http://technology.chtsai.org/mmseg/

【中文分词】简单高效的MMSeg
https://www.cnblogs.com/en-heng/p/5872308.html
Python实现mmseg分词算法和吐嘈
http://blog.csdn.net/acceptedxukai/article/details/7390300
medcl mmseg
https://github.com/medcl/elasticsearch-analysis-mmseg

ES test ：
Indices APIs » Analyze

GET idx/_analyze{  "analyzer" : "whitespace",  "text" : "this is a test"}GET idx/_analyze{  "field" : "obj1.field1",  "text" : "this is a test"}{  "analyzer" : "mmseg_maxword",  "text" : "中华人民共和国"}

ik

https://github.com/medcl/elasticsearch-analysis-ik

ES 2.1 –IK 1.7
https://github.com/medcl/elasticsearch-analysis-ik/tree/v1.7.0
down：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v1.7.0

Analyzer: ik_smart , ik_max_word ,
Tokenizer: ik_smart , ik_max_word

{  "analyzer" : "ik_max_word",  "text" : "美国留给伊拉克的是个烂摊子吗"}

FAQ

安装插件有问题情况下，中文分词总是逐字分词。一般重启即可，原因待分析

阅读全文

0 0