Lucene 深入学习(3)Lucene索引初识

来源:互联网 发布:变化中的中国网络零售 编辑:程序博客网 时间:2024/05/29 16:17

前言: Lucene的倒排索引是了解整个搜索的入口,本节将通过代码实例的方式初步认识Lucene的索引创建过程。

原始文档

可以使用任何文本文件来熟悉Lucene索引的创建,这里我选用了经典作品《致加西亚的信》英文版 A Message to Garcia,你可以在互联网上轻易地找到这篇文章。之所以用英文材料,是因为Lucene默认只支持英文,英文材料可以避免出现分词不准确造成的认知偏差,在学习完分词的相关知识后,会采用中文演示案例。

代码实例

Lucene是用Java语言实现的,可以很方便使用Java语言对其扩展。不过在Lucene开源以后,已经有了其他语言的实现版本。

public void index(){        //使用了标准分词器,可以选择的分词很多        Analyzer analyzer = new StandardAnalyzer();        //可以先将索引存储在内存中        Directory directory = new RAMDirectory();        IndexWriterConfig config = new IndexWriterConfig(analyzer);        IndexWriter writer = null;        try {            writer = new IndexWriter(directory, config);            //选择了一些片段            String[] texts = new String[]{                    "In all this Cuban business there is one man stands out on the horizon of my memory like Mars at perihelion.",                    "Garcia was somewhere in the mountain vastness of Cuba- no one knew where.",                    "Some one said to the President, There’s a fellow by the name of Rowan will find Garcia for you, if anybody can.",                    "Rowan was sent for and given a letter to be delivered to Garcia.",                    "The point I wish to make is this: McKinley gave Rowan a letter to be delivered to Garcia;",                    "General Garcia is dead now, but there are other Garcias.",                    "He will look at you out of a fishy eye and ask one or more of the following questions",                    "What do you want to know for?",                    "Can such a one write a letter to Garcia?",                    "Can such a man be entrusted to carry a message to Garcia?"                };                for (String text : texts) {                    Document doc = new Document();                    doc.add(new Field("fieldname", text, TextField.TYPE_STORED));                    writer.addDocument(doc);                }                writer.close();        } catch (IOException e) {            e.printStackTrace();        }    }}

很简单的实例来演示了Lucene创建索引的过程,简单的总结一下,应该按下面的步骤创建Luncene索引:
1. 指定分词策略(选分词器Analyzer);
2. 通过添加字段(Field)创建文档(Document);
3. 创建IndexWriter,通过addDocument()方法添加文档(Document)

原创粉丝点击