基于Lucene3.5.0如何从TokenStream获得Token
来源:互联网 发布:java webclient post 编辑:程序博客网 时间:2024/04/29 05:20
通过学习Lucene3.5.0的doc文档,对不同release版本 lucene版本的API改动做分析。最后找到了有价值的改动信息。LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir) 以上信息可以知道,原来的通过的方法已经不能够提取响应的Token了
通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口 因此我编写了一个例子来更好的从TokenStream中提取Token
- StringReader reader = new StringReader(s);
- TokenStream ts =analyzer.tokenStream(s, reader);
- TermAttribute ta = ts.getAttribute(TermAttribute.class);
- package com.segment;
- import java.io.StringReader;
- import org.apache.lucene.analysis.Analyzer;
- import org.apache.lucene.analysis.Token;
- import org.apache.lucene.analysis.TokenStream;
- import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
- import org.apache.lucene.analysis.tokenattributes.TermAttribute;
- import org.apache.lucene.util.AttributeImpl;
- import org.wltea.analyzer.lucene.IKAnalyzer;
- public class Segment {
- public static String show(Analyzer a, String s) throws Exception {
- StringReader reader = new StringReader(s);
- TokenStream ts = a.tokenStream(s, reader);
- String s1 = "", s2 = "";
- boolean hasnext= ts.incrementToken();
- //Token t = ts.next();
- while (hasnext) {
- //AttributeImpl ta = new AttributeImpl();
- CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
- //TermAttribute ta = ts.getAttribute(TermAttribute.class);
- s2 = ta.toString() + " ";
- s1 += s2;
- hasnext = ts.incrementToken();
- }
- return s1;
- }
- public String segment(String s) throws Exception {
- Analyzer a = new IKAnalyzer();
- return show(a, s);
- }
- public static void main(String args[])
- {
- String name = "我是俊杰,我爱编程,我的测试用例";
- Segment s = new Segment();
- String test = "";
- try {
- System.out.println(test+s.segment(name));
- } catch (Exception e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- }
- }
- 基于Lucene3.5.0如何从TokenStream获得Token
- 基于Lucene3.5.0如何从TokenStream获得Token
- 基于Lucene3.5.0如何从TokenStream获得Token
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- Lucene3.0(2.9)中对于TokenStream的遍历方法!
- Lucene分词实现---Analyzer、TokenStream(Token、Tokenizer、Tokenfilter)
- Lucene分词实现---Analyzer、TokenStream(Token、Tokenizer、Tokenfilter)
- 如何获得存储在AccountManager里的Token
- facebook开发如何获得当前登录用户的token
- JS获得token
- luence获得token
- 获得UAA access token
- openjweb基于lucene3全文检索技术实现
- 如何从结果集中获得随机结果
- 如何从结果集中获得随机结果
- 如何从结果集中获得随机结果
- 如何从一个 HWND 获得 IHTMLDocument 2
- CentOS安装GD库支持
- Ubuntu桌面路径问题
- 天使投资人李治国:谈创业
- 字符编码: ANSI/UTF-8/UCS2(UTF-16),以及回车换行
- E-R图
- 基于Lucene3.5.0如何从TokenStream获得Token
- U盘安装Ubuntu
- centos5最小化安装+apache+php+mysql+gd+zend+phpmyadmin
- 让Fedora 11支持RAR解压
- 在ActionScript中替换子字符串
- at91sam9260 开发环境的建立
- PuTTY/Putty设置成安全SSH代理服务器
- 基于jQuery的AJAX和JSON的实例
- fedora 设置网卡 网关及DNS