Lucene:PhraseQuery查询

来源:互联网 发布:iphone官方壁纸软件 编辑:程序博客网 时间:2024/05/16 04:35

package com.firstproject.testsearch;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class PraQuery
{
 public static void main(String args[]) throws IOException
 {
  Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
  String indexDir = "E:\\studyworkspace\\lucene\\hellolucene\\index";
  Directory dir = FSDirectory.open(new File(indexDir));

  // true 表示创建或覆盖当前索引;false表示对当前索引进行追加
  // Default value is 128
  IndexWriter writer = new IndexWriter(dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);

  writer.setUseCompoundFile(true);

  Document doc1 = new Document();
  Document doc2 = new Document();
  Document doc3 = new Document();
  Document doc4 = new Document();
  Document doc5 = new Document();
  Document doc6 = new Document();

  Field f1 = new Field("bookname", "钢铁是怎样炼成的", Field.Store.YES, Field.Index.ANALYZED);
  Field f2 = new Field("bookname", "钢铁战士", Field.Store.YES, Field.Index.ANALYZED);
  Field f3 = new Field("bookname", "钢和铁是两种金属元素", Field.Store.YES, Field.Index.ANALYZED);
  Field f4 = new Field("bookname", "钢要比铁含有更多的碳元素", Field.Store.YES, Field.Index.ANALYZED);
  Field f5 = new Field("bookname", "铁和钢是两种重要的金属", Field.Store.YES, Field.Index.ANALYZED);
  Field f6 = new Field("bookname", "铁钢是两种重要的金属", Field.Store.YES, Field.Index.ANALYZED);

  doc1.add(f1);
  doc2.add(f2);
  doc3.add(f3);
  doc4.add(f4);
  doc5.add(f5);
  doc6.add(f6);

  writer.addDocument(doc1);
  writer.addDocument(doc2);
  writer.addDocument(doc3);
  writer.addDocument(doc4);
  writer.addDocument(doc5);
  writer.addDocument(doc6);

  writer.close();

  IndexSearcher searcher = new IndexSearcher(dir);
  PhraseQuery query = new PhraseQuery();
  query.add(new Term("bookname", "钢"));
  query.add(new Term("bookname", "铁"));
//  query.add(new Term("bookname", "元"));
//  query.setSlop(6);
  System.out.println("最终查询 query is :  " + query.getClass() + "\r\n语法:  " + query);
  TopDocs topdocs = searcher.search(query, 100);
  ScoreDoc[] scoreDocs = topdocs.scoreDocs;
  for (int i = 0; i < scoreDocs.length; i++)
  {
   System.out.println(searcher.doc(scoreDocs[i].doc));
  }

 }
}

 

 

最后输出如下

最终查询 query is :  class org.apache.lucene.search.PhraseQuery
语法:  bookname:"钢 铁"
Document<stored,indexed,tokenized<bookname:钢铁战士>>
Document<stored,indexed,tokenized<bookname:钢铁是怎样炼成的>>

 

加入query.setSlop(1);后:

最终查询 query is :  class org.apache.lucene.search.PhraseQuery
语法:  bookname:"钢 铁"~1
Document<stored,indexed,tokenized<bookname:钢铁战士>>
Document<stored,indexed,tokenized<bookname:钢铁是怎样炼成的>>
Document<stored,indexed,tokenized<bookname:钢和铁是两种金属元素>>

 

项之间距离越小的匹配具有的权重越大

 

 

 

0 0
原创粉丝点击