lucene全文检索

来源：互联网发布：ubuntu php ide 编辑：程序博客网时间：2024/06/05 16:28

全文检索：首先是对要搜索的文档进行分词，然后再执行索引，通过查询索引再去查询文档。类似于字典

全文检索的应用场景一般是

、 1：搜索引擎

2：站内搜索

3：文件系统搜索功能

但是 lucene 和搜索引擎不是一回事，lucene 是一个工具包不能单独运行，不能单独对外提供服务。

要先知道一点就是 lucene是把采集的信息存储在索引库中，它先通过采集数据----对数据进行索引-----存放索引库中。查询的时候也是直接查询索引库。

lucene需要先采集数据，但是全文检索搜索的内容是多样，对于不同的数据先采集到本地然后装到lucene中。需要将存储的内容进行统一才可以对他查询。

数据采集的方式

1：爬虫 2：数据库 3：文件系统中 IO流

lucene：逻辑结构分为索引域和文档域

文档域：存储的信息就是通过Document对象来存储，具体是通过Document对象中filed域l来存储数据

数据库中一条记录会存储一个Document对象的一个field域

文档域中每个Document对象没有关系。

索引域：经过lucene分词之后生成的索引域，查询的时候鲜查询索引，再去找文档与。

例子: 采集数据源

BookDao bd=new BookDaoImpl();List<Items> queryBooks = bd.queryBooks();System.out.println(queryBooks.size());//采集到的数据封装到document 对象中List<Document>  docList=new ArrayList<>();Document  document;for (Items items : queryBooks) {document=new Document();//如果  是yes 说明存储到文档域中  Field  fieldId=new TextField("id",items.getId()+"",Store.YES);Field  fieldName=new TextField("name",items.getName(),Store.YES);Field  fieldPrice=new TextField("price",items.getPrice()+"",Store.YES);Field  fieldDetail=new TextField("detail",items.getDetail(),Store.YES);document.add(fieldId);document.add(fieldName);document.add(fieldPrice);document.add(fieldDetail);docList.add(document);}//标准分词器Analyzer  analyzer=new StandardAnalyzer();//创建IndexWriterIndexWriterConfig config=new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);//指定索引地址File indexFile=new File("E:\\lucene");Directory directory=FSDirectory.open(indexFile);IndexWriter writer=new IndexWriter(directory, config);for (Document doc : docList) {writer.addDocument(doc);}//关闭wirterwriter.close();

上面代码已经把数据源采集到 lucene 库中

接下来说下分词 lucene中分词主要分为2个步骤：分词、过滤。

分词：把filed域中的内容一个个分词

过滤：将分好的词进行过滤，去掉标点，大小写转换、停用词过滤比如的、啊、等。

下一步测试下查询：

//创建Query对象QueryParser parser=new QueryParser("detail", new StandardAnalyzer());//创建查询对象     关键字一定要大写Query parse = parser.parse("name:22  AND name:机");File indexFile=new File("E:\\lucene");Directory directory=FSDirectory.open(indexFile);//打开目录流对象IndexReader reader=DirectoryReader.open(directory);//第二个参数  指定需要显示的顶部记录数量IndexSearcher searcher=new IndexSearcher(reader);TopDocs search = searcher.search(parse, 10);//根据查询条件匹配出的总数int totalHits = search.totalHits;ScoreDoc[] scoreDocs = search.scoreDocs;for (ScoreDoc scoreDoc : scoreDocs) {//文档IDint docId = scoreDoc.doc;//通过ID获取文档System.out.println("商品ID："+docId);Document doc = searcher.doc(docId);System.out.println("ID："+doc.get("id"));System.out.println("name："+doc.get("name"));System.out.println("price："+doc.get("price"));System.out.println("detail:"+doc.get("detail"));}reader.close();

0 0