Lucene介绍和实战

来源：互联网发布：java程序员等级划分编辑：程序博客网时间：2024/06/06 10:57

前言

给你一张过去的CD，听听我们有过的思绪~~~~~

Lucene简介

Lucene是一个开源的、成熟的java检索库。它为许多文档（Document）维护了一个倒排索引表，并且向外表现出了简单易用的API。更多有关Lucene的介绍可以参看Lucene的百科。

下图展现了Lucene的索引处理和检索处理的流程（点击图片放大）：

下面的表格描述了Lucene中各包的作用。

包名

功能

org.apache.lucene.analysis

语言分析器，主要用于切词，中文切词可以扩展此类

org.apache.lucene.document

索引存储时的文档结构管理，类似于关系型数据库的表结构

org.apache.lucene.index

索引管理，包括索引建立、删除等

org.apache.lucene.queryParser

查询分析器，实现查询关键词的运算，如与、或、非等

org.apache.lucene.search

检索管理，根据查询条件，检索得到结果

org.apache.lucene.store

数据存储管理，主要包括一些底层的I/O操作

org.apache.lucene.util

一些公用类

Lucene入门应用

上图中，红色部分是我们需要利用Lucene的API来进行干涉的，不过这些都非常容易。下面是利用Lucene实现全文检索功能的一般步骤（未整合任何框架）：
创建索引
[java] view plain copy
package org.xiaom.lucene;

import java.io.BufferedReader;

public class MyIndexCreater {
    private static IndexWriter indexWriter;
    private static Version version = Version.LUCENE_35;
    /**
     * 为该目录<strong>及其子目录</strong>下所有的文本文件（.java;.xml;.txt)创建索引
     * @param docPath 文档存放路径
     * @param indexPath　索引存放路径
     */
    public static void createContainChild(String docPath, String indexPath)
            throws IOException {
        File docDir = new File(docPath);
        File indexDir = new File(indexPath);
        //1,打开索引的存放目录
        Directory directory = FSDirectory.open(indexDir);
        //2,创建IndexWriterConfig
        IndexWriterConfig conf = new IndexWriterConfig(version,new StandardAnalyzer(version));
        //每次都覆盖之前的索引文件
        conf.setOpenMode(OpenMode.CREATE);
        //根据IndexWriterConfig实例创建IndexWriter
        indexWriter = new IndexWriter(directory, conf);

        indexDir(docDir);
        //7,提交，关闭indexWrtier(必须)
        indexWriter.commit();
        indexWriter.close();
    }
    // 该目录及其子目录创建索引，返回索引文件总数
    private static int indexDir(File dir) {
        int c = 0;
        File[] files = dir.listFiles();
        for (File f : files) {
            if (f.isDirectory()) {
                indexDir(f);
            } else if (f.getName().endsWith(".java")
                    || f.getName().endsWith(".txt")
                    || f.getName().endsWith(".xml")) {
                c += indexFile(f);
            }
        }
        return c;
    }
    //为某个文件创建索引,索引成功返回1,失败0
    private static int indexFile(File f) {
        boolean rs = true;
        BufferedReader br = null;
        String titleStr = null;
        StringBuffer contentStr = new StringBuffer();
        try {
            br = new BufferedReader(new FileReader(f));
            titleStr = br.readLine();
            String s;
            while((s=br.readLine())!=null){
                contentStr.append(s);
                contentStr.append("\n");
            }
            //3,创建Document对象
            Document doc = new Document();
            //4,创建Field对象
            Field name = new Field("name", f.getName(), Store.YES, Index.ANALYZED);
            Field title = new Field("title", titleStr, Store.YES, Index.ANALYZED);
            Field content = new Field("content", contentStr.toString(), Store.YES,Index.ANALYZED);
            //5,将Field对象加入到Document
            doc.add(name);
            doc.add(title);
            doc.add(content);
            //6,将Document加入到indexWriter
            indexWriter.addDocument(doc);
        } catch (Exception e) {
            e.printStackTrace();
            rs = false;
        }
        return rs ? 1 : 0;
    }
}
搜索
[java] view plain copy
package org.xiaom.lucene;

import java.io.File;

public class MyIndexSearcher {
    private static Version version=Version.LUCENE_35;
    /**
     * @param indexPath 索引存放路径
     * @param key 搜索关键字
     * @param value 关键字的值
     */
    public static void search(String indexPath, String key, String value) {
        IndexReader ireader = null;
        try {
            //1,创建IndexReader
            ireader = IndexReader.open(FSDirectory.open(new File(indexPath)));
            //2,根据indexReader实例创建IndexSearcher
            IndexSearcher indexSearcher = new IndexSearcher(ireader);
            //3,创建QueryParser
            QueryParser queryParser =new QueryParser(version,key,new StandardAnalyzer(version));
            //4,通过queryParser解析出Query
            Query query=queryParser.parse(value);
            //5,使用TopDocs接收indexSearcher.searche的返回值
            TopDocs topDocs=indexSearcher.search(query,100);
            ScoreDoc[] scoreDocs=topDocs.scoreDocs;
            //6,获取Document输出
            System.err.println("total hit:"+topDocs.totalHits);
            System.out.println("total document:"+scoreDocs.length);
            System.err.println("==================================================");
            for(int i=0;i<scoreDocs.length;i++){
                Document doc=indexSearcher.doc(scoreDocs[i].doc);
                System.out.println("name:"+doc.get("name"));
                System.err.println("title:"+doc.get("title"));
                System.out.println("score:"+scoreDocs[i].score);
                System.err.println("content:"+doc.get("content").substring(0, 80));
            }
        } catch (CorruptIndexException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ParseException e) {
            e.printStackTrace();
        }
    }
}
测试检索
[java] view plain copy
package org.xiaom.lucene;

import java.io.IOException;

public class LuceneTest {
public static void main(String[] args) throws IOException {
    String docPath="D:/test1/docs";
    String indexPath="D:/test1/index";
    MyIndexCreater.createContainChild(docPath, indexPath);
    MyIndexSearcher.search(indexPath, "content", "adfddd");
}
}

这里是一个Lucene3.5入门实例下载
维护索引

维护索引一般有如下几种操作
增加索引(见上文)
删除索引
[java] view plain copy
//删除某些满足条件的索引及Document
    public boolean delete(Term term){
        boolean rs=true;
        try {
            indexWriter.deleteDocuments(term);
        } catch (CorruptIndexException e) {
            e.printStackTrace();
            rs=false;
        } catch (IOException e) {
            rs=false;
            e.printStackTrace();
        }
        return rs;
    }

更新（删除索引后新增）索引
[java] view plain copy
public boolean update(Document doc){
        boolean rs=true;
        try {
            indexWriter.addDocument(doc);
        } catch (CorruptIndexException e) {
            rs=false;
            e.printStackTrace();
        } catch (IOException e) {
            rs=false;
            e.printStackTrace();
        }
        return rs;
    }

合并索引文件
public void addIndexes(Directory... dirs)将dirs中索引合并到IndexWriter中，等待commit。

阅读全文

0 0