给你的网站加上站内搜索---Spring+Hibernate基于Compass(基于Lucene)实现

来源:互联网 发布:数据分析找工作知乎 编辑:程序博客网 时间:2024/06/06 16:42

给你的网站加上站内搜索---Compass入门教程

syxChina(syxchina.cnblogs.com)

Compass(基于Lucene)入门教程

1 序言

2 Compass介绍

3 单独使用Compass

4 spring+hibernate继承compass

4-1 jar包

4-2 配置文件

4-3 源代码

4-4 说明

4-5 测试

5 总结下吧

1 序言

这些天一直在学点新的东西,想给毕业设计添加点含量,长时间的SSH项目也想尝试下新的东西和完善以前的技术,搜索毋容置疑是很重要的。作为javaer,作为apache的顶级开源项目lucene应该有所耳闻吧,刚学完lucene,知道了基本使用,学的程度应该到可以使用的地步,但不的不说lucene官方给的文档例子不是很给力的,还好互联网上资料比较丰富!在搜索lucene的过程中,知道了基于lucene的compass和lucene-nutch。lucene可以对给定内容加上索引搜索,但比如搜索本地数据库和web网页,你需要把数据给拿出来索引再搜索,所以你就想可不可以直接搜索数据库,以数据库内容作为索引,并且伴随着数据库的CRUD,索引也会更新,compass出现了,compass作为站内搜索那是相当的方便的,并且官方提供了spring和hibernate的支持,更是方便了。Lucene-nutch是基于lucene搜索web页面的,如果有必要我在分享下lucene、lecene-nutch的学习经验,快速入门,其他的可以交给文档和谷歌了。

不得不提下,compass09年貌似就不更新了,网上说只支持lucene3.0以下版本,蛮好的项目不知道为什么不更新了,试了下3.0以后的分词器是不能使用了,我中文使用JE-Analyzer.jar。我使用的环境:

Spring3.1.0+Hibernate3.6.6+Compass2.2.0。

2 Compass介绍

Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括: 

* 搜索引擎抽象层(使用Lucene搜索引荐),

* OSEM (Object/Search Engine Mapping) 支持,

* 事务管理,

* 类似于Google的简单关键字查询语言,

* 可扩展与模块化的框架,

* 简单的API.

官方网站:谷歌

3 单独使用Compass

Compass可以不继承到hibernate和spring中的,这个是从网上摘录的,直接上代码:

wps_clip_image-6849wps_clip_image-20611wps_clip_image-27320

@Searchable

publicclass Book {

private Stringid;//编号

private Stringtitle;//标题

private Stringauthor;//作者

privatefloatprice;//价格

public Book() {

}

public Book(String id, String title, String author,float price) {

super();

this.id = id;

this.title = title;

this.author = author;

this.price = price;

}

@SearchableId

public String getId() {

returnid;

}

@SearchableProperty(boost = 2.0F, index = Index.TOKENIZED, store = Store.YES)

public String getTitle() {

returntitle;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public String getAuthor() {

returnauthor;

}

@SearchableProperty(index = Index.NO, store = Store.YES)

publicfloat getPrice() {

returnprice;

}

publicvoid setId(String id) {

this.id = id;

}

publicvoid setTitle(String title) {

this.title = title;

}

publicvoid setAuthor(String author) {

this.author = author;

}

publicvoid setPrice(float price) {

this.price = price;

}

@Override

public String toString() {

return"[" +id + "] " +title + " - " + author +" $ " + price;

}

}

publicclass Searcher {

protected Compasscompass;

public Searcher() {

}

public Searcher(String path) {

compass =new CompassAnnotationsConfiguration()//

.setConnection(path).addClass(Book.class)//

.setSetting("compass.engine.highlighter.default.formatter.simple.pre","<font color='red'>")//

.setSetting("compass.engine.highlighter.default.formatter.simple.post","</font>")//

.buildCompass();//

Runtime.getRuntime().addShutdownHook(new Thread() {

publicvoid run() {

compass.close();

}

});

}

/**

* 新建索引

* @param book

*/

publicvoid index(Book book) {

CompassSession session = null;

CompassTransaction tx = null;

try {

session = compass.openSession();

tx = session.beginTransaction();

session.create(book);

tx.commit();

} catch (RuntimeException e) {

if (tx !=null)

tx.rollback();

throw e;

} finally {

if (session !=null) {

session.close();

}

}

}

/**

* 删除索引

* @param book

*/

publicvoid unIndex(Book book) {

CompassSession session = null;

CompassTransaction tx = null;

try {

session = compass.openSession();

tx = session.beginTransaction();

session.delete(book);

tx.commit();

} catch (RuntimeException e) {

tx.rollback();

throw e;

} finally {

if (session !=null) {

session.close();

}

}

}

/**

* 重建索引

* @param book

*/

publicvoid reIndex(Book book) {

unIndex(book);

index(book);

}

/**

* 搜索

* @param queryString

* @return

*/

public List<Book> search(String queryString) {

CompassSession session = null;

CompassTransaction tx = null;

try {

session = compass.openSession();

tx = session.beginTransaction();

CompassHits hits = session.find(queryString);

int n = hits.length();

if (0 == n) {

return Collections.emptyList();

}

List<Book> books = new ArrayList<Book>();

for (int i = 0; i < n; i++) {

books.add((Book) hits.data(i));

}

hits.close();

tx.commit();

return books;

} catch (RuntimeException e) {

tx.rollback();

throw e;

} finally {

if (session !=null) {

session.close();

}

}

}

publicclass Main {

static List<Book>db =new ArrayList<Book>();

static Searchersearcher =new Searcher("index");

publicstaticvoid main(String[] args) {

add(new Book(UUID.randomUUID().toString(),"Thinking in Java","Bruce", 109.0f));

add(new Book(UUID.randomUUID().toString(),"Effective Java", "Joshua", 12.4f));

add(new Book(UUID.randomUUID().toString(),"Java Thread Programing","Paul", 25.8f));

long begin = System.currentTimeMillis();

int count = 30;

for(int i=1; i<count; i++) {

if(i%10 == 0) {

long end = System.currentTimeMillis();

System.err.println(String.format("当时[%d]条,剩[%d]条,已用时间[%ds],估计时间[%ds].", i,count-i,(end-begin)/1000, (int)((count-i)*((end-begin)/(i*1000.0))) ));

}

String uuid = new Date().toString();

add(new Book(uuid, uuid.substring(0, uuid.length()/2), uuid.substring(uuid.length()/2), (float)Math.random()*100));

}

int n;

do {

n = displaySelection();

switch (n) {

case 1:

listBooks();

break;

case 2:

addBook();

break;

case 3:

deleteBook();

break;

case 4:

searchBook();

break;

case 5:

return;

}

} while (n != 0);

}

staticint displaySelection() {

System.out.println("\n==select==");

System.out.println("1. List all books");

System.out.println("2. Add book");

System.out.println("3. Delete book");

System.out.println("4. Search book");

System.out.println("5. Exit");

int n =readKey();

if (n >= 1 && n <= 5)

return n;

return 0;

}

/**

* 增加一本书到数据库和索引中

*

* @param book

*/

privatestaticvoid add(Book book) {

db.add(book);

searcher.index(book);

}

/**

* 打印出数据库中的所有书籍列表

*/

publicstaticvoid listBooks() {

System.out.println("==Database==");

int n = 1;

for (Book book :db) {

System.out.println(n +")" + book);

n++;

}

}

/**

* 根据用户录入,增加一本书到数据库和索引中

*/

publicstaticvoid addBook() {

String title = readLine(" Title: ");

String author = readLine(" Author: ");

String price = readLine(" Price: ");

Book book = new Book(UUID.randomUUID().toString(), title, author, Float.valueOf(price));

add(book);

}

/**

* 删除一本书,同时删除数据库,索引库中的

*/

publicstaticvoid deleteBook() {

listBooks();

System.out.println("Book index: ");

int n =readKey();

Book book = db.remove(n - 1);

searcher.unIndex(book);

}

/**

* 根据输入的关键字搜索书籍

*/

publicstaticvoid searchBook() {

String queryString = readLine(" Enter keyword: ");

List<Book> books = searcher.search(queryString);

System.out.println(" ====search results:" + books.size() +"====");

for (Book book : books) {

System.out.println(book);

}

}

publicstaticint readKey() {

BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

try {

int n = reader.read();

n = Integer.parseInt(Character.toString((char) n));

return n;

} catch (Exception e) {

thrownew RuntimeException();

}

}

publicstaticString readLine(String propt) {

System.out.println(propt);

BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

try {

return reader.readLine();

} catch (Exception e) {

thrownew RuntimeException();

}

}

}

wps_clip_image-5530

这种方法向数据库插入数据和加索引速度很慢,下面方法可以提高,注意这上面没设置分词器,所以使用默认的,如果是中文的话会分隔为一个一个的。

4 spring+hibernate继承compass

4-1 jar包

wps_clip_image-25759wps_clip_image-5116wps_clip_image-20051wps_clip_image-11513wps_clip_image-29500wps_clip_image-6212

4-2 配置文件

wps_clip_image-9831

Beans.xml

<?xmlversion="1.0"encoding="UTF-8"?>

<beansxmlns="http://www.springframework.org/schema/beans"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:context="http://www.springframework.org/schema/context"

xmlns:aop="http://www.springframework.org/schema/aop"xmlns:tx="http://www.springframework.org/schema/tx"

xsi:schemaLocation="http://www.springframework.org/schema/beans

         http://www.springframework.org/schema/beans/spring-beans-3.0.xsd

         http://www.springframework.org/schema/context

         http://www.springframework.org/schema/context/spring-context-3.0.xsd

         http://www.springframework.org/schema/tx

      http://www.springframework.org/schema/tx/spring-tx-3.0.xsd

         http://www.springframework.org/schema/aop

         http://www.springframework.org/schema/aop/spring-aop-3.0.xsd">

<context:annotation-config/>

<context:component-scanbase-package="com.syx.compass"></context:component-scan>

<aop:aspectj-autoproxy></aop:aspectj-autoproxy>

<importresource="hibernate-beans.xml"/>

<importresource="compass-beans.xml"/>

</beans>

compass-beans.xml

<?xmlversion="1.0"encoding="UTF-8"?>

<beansxmlns="...">

<!--compass主配置 -->

<beanid="compass"class="org.compass.spring.LocalCompassBean">

<propertyname="compassSettings">

<props>

<propkey="compass.engine.connection">file://compass</prop><!-- 数据索引存储位置 -->

<propkey="compass.transaction.factory">

org.compass.spring.transaction.SpringSyncTransactionFactory</prop>

<propkey="compass.engine.analyzer.default.type">

jeasy.analysis.MMAnalyzer</prop><!--定义分词器-->

<propkey="compass.engine.highlighter.default.formatter.simple.pre">

<![CDATA[<font color="red"><b>]]></prop>

<propkey="compass.engine.highlighter.default.formatter.simple.post">

<![CDATA[</b></font>]]></prop>

</props>

</property>

<propertyname="transactionManager">

<refbean="txManager"/>

</property>

<propertyname="compassConfiguration"ref="annotationConfiguration"/>

<propertyname="classMappings">

<list>

<value>com.syx.compass.test1.Article</value>

</list>

</property>

</bean>

<beanid="annotationConfiguration"

class="org.compass.annotations.config.CompassAnnotationsConfiguration">

</bean>

<beanid="compassTemplate"class="org.compass.core.CompassTemplate">

<propertyname="compass"ref="compass"/>

</bean>

<!-- 同步更新索引, 数据库中的数据变化后同步更新索引 -->

<beanid="hibernateGps"class="org.compass.gps.impl.SingleCompassGps"

init-method="start"destroy-method="stop">

<propertyname="compass">

<refbean="compass"/>

</property>

<propertyname="gpsDevices">

<list>

<refbean="hibernateGpsDevice"/>

</list>

</property>

</bean>

<!--hibernate驱动 链接compass和hibernate -->

<beanid="hibernateGpsDevice"

class="org.compass.spring.device.hibernate.dep.SpringHibernate3GpsDevice">

<propertyname="name">

<value>hibernateDevice</value>

</property>

<propertyname="sessionFactory">

<refbean="sessionFactory"/>

</property>

<propertyname="mirrorDataChanges">

<value>true</value>

</property>

</bean>

<!-- 定时重建索引(利用quartz)或随Spring ApplicationContext启动而重建索引 -->

<beanid="compassIndexBuilder"

class="com.syx.compass.test1.CompassIndexBuilder"

lazy-init="false">

<propertyname="compassGps"ref="hibernateGps"/>

<propertyname="buildIndex"value="false"/>

<propertyname="lazyTime"value="1"/>

</bean>

<!-- 搜索引擎服务类 -->

<beanid="searchService"class=" com.syx.compass.test1.SearchServiceBean">

<propertyname="compassTemplate">

<refbean="compassTemplate"/>

</property>

</bean>

</beans>

hibernate-beans.xml

<?xmlversion="1.0"encoding="UTF-8"?>

<beansxmlns="...">

<!-- DataSource -->

<beanid="dataSource"class="com.mchange.v2.c3p0.ComboPooledDataSource">

<propertyname="driverClass"value="${jdbc.driverClassName}"/>

<propertyname="jdbcUrl"value="${jdbc.url}"/>

<propertyname="user"value="${jdbc.username}"/>

<propertyname="password"value="${jdbc.password}"/>

<propertyname="autoCommitOnClose"value="true"/>

<propertyname="checkoutTimeout"value="${cpool.checkoutTimeout}"/>

<propertyname="initialPoolSize"value="${cpool.minPoolSize}"/>

<propertyname="minPoolSize"value="${cpool.minPoolSize}"/>

<propertyname="maxPoolSize"value="${cpool.maxPoolSize}"/>

<propertyname="maxIdleTime"value="${cpool.maxIdleTime}"/>

<propertyname="acquireIncrement"value="${cpool.acquireIncrement}"/>

<!-- <property name="maxIdleTimeExcessConnections" value="${cpool.maxIdleTimeExcessConnections}"/> -->

</bean>

<bean

class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

<propertyname="locations">

<value>classpath:jdbc.properties</value>

</property>

</bean>

<!-- SessionFacotory -->

<beanid="sessionFactory"

class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">

<propertyname="dataSource"ref="dataSource"/>

<propertyname="annotatedClasses">

<list>

<value>com.syx.compass.model.Article</value>

<value>com.syx.compass.model.Author</value>

<value>com.syx.compass.test1.Article</value>

</list>

</property>

<propertyname="hibernateProperties">

<props>

<propkey="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>

<propkey="hibernate.current_session_context_class">thread</prop>

<propkey="javax.persistence.validation.mode">none</prop>

<propkey="hibernate.show_sql">true</prop>

<propkey="hibernate.format_sql">false</prop>

<propkey="hibernate.hbm2ddl.auto">update</prop>

</props>

</property>

</bean>

<beanid="hibernateTemplate"class="org.springframework.orm.hibernate3.HibernateTemplate">

<propertyname="sessionFactory"ref="sessionFactory"></property>

</bean>

<beanid="txManager"

class="org.springframework.orm.hibernate3.HibernateTransactionManager">

<propertyname="sessionFactory"ref="sessionFactory"/>

</bean>

</beans>

jdbc.properties

jdbc.driverClassName=com.mysql.jdbc.Driver

jdbc.hostname=localhost

jdbc.url=jdbc:mysql://localhost:3306/compass

jdbc.username=root

jdbc.password=root

cpool.checkoutTimeout=5000

cpool.minPoolSize=1

cpool.maxPoolSize=4

cpool.maxIdleTime=25200

cpool.maxIdleTimeExcessConnections=1800

cpool.acquireIncrement=5

log4j.properties

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.Target=System.out

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

log4j.rootLogger=error, stdout

4-3 源代码

wps_clip_image-5691

@Searchable(alias ="article")

@Entity(name="_article")

publicclass Article {

private LongID; // 标识ID

private Stringcontent; // 正文

private Stringtitle; // 文章标题

private DatecreateTime; // 创建时间

public Article(){}

public Article(Long iD, String content, String title, Date createTime) {

ID = iD;

this.content = content;

this.title = title;

this.createTime = createTime;

}

public String toString() {

return String.format("%d,%s,%s,%s",ID, title,content, createTime.toString());

}

@SearchableId

@Id

@GeneratedValue

public Long getID() {

returnID;

}

publicvoid setID(Long id) {

ID = id;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public String getContent() {

returncontent;

}

publicvoid setContent(String content) {

this.content = content;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public String getTitle() {

returntitle;

}

publicvoid setTitle(String title) {

this.title = title;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public Date getCreateTime() {

returncreateTime;

}

publicvoid setCreateTime(Date createTime) {

this.createTime = createTime;

}

}

publicclass CompassIndexBuilderimplements InitializingBean {   

// 是否需要建立索引,可被设置为false使本Builder失效.    

privatebooleanbuildIndex =false;    

// 索引操作线程延时启动的时间,单位为秒    

privateintlazyTime = 10;    

// Compass封装    

private CompassGpscompassGps;    

// 索引线程    

private ThreadindexThread =new Thread() {    

@Override

publicvoid run() {    

try {    

                Thread.sleep(lazyTime * 1000);    

                System.out.println("begin compass index...");    

long beginTime = System.currentTimeMillis();    

// 重建索引.    

// 如果compass实体中定义的索引文件已存在,索引过程中会建立临时索引,    

// 索引完成后再进行覆盖.    

compassGps.index();    

long costTime = System.currentTimeMillis() - beginTime;    

                System.out.println("compss index finished.");    

                System.out.println("costed " + costTime +" milliseconds");    

            } catch (InterruptedException e) {    

                e.printStackTrace();    

            }    

        }    

    };    

/**  

     * 实现<code>InitializingBean</code>接口,在完成注入后调用启动索引线程.

     */

publicvoid afterPropertiesSet()throws Exception {    

if (buildIndex) {    

indexThread.setDaemon(true);    

indexThread.setName("Compass Indexer");    

indexThread.start();    

        }    

    }    

publicvoid setBuildIndex(boolean buildIndex) {    

this.buildIndex = buildIndex;    

    }    

publicvoid setLazyTime(int lazyTime) {    

this.lazyTime = lazyTime;    

    }    

publicvoid setCompassGps(CompassGps compassGps) {    

this.compassGps = compassGps;    

    }    

}  

publicclass SearchServiceBean {

private CompassTemplatecompassTemplate;

/** 索引查询 * */

publicMap find(final String keywords,final String type,finalint start,finalint end) {

returncompassTemplate.execute(new CompassCallback<Map>() {

publicMap doInCompass(CompassSession session)throws CompassException {

List result = newArrayList();

int totalSize = 0;

Map container = newHashMap();

CompassQuery query = session.queryBuilder().queryString(keywords).toQuery();

CompassHits hits = query.setAliases(type).hits();

totalSize = hits.length();

container.put("size", totalSize);

int max = 0;

if (end < hits.length()) {

max = end;

} else {

max = hits.length();

}

if (type.equals("article")) {

for (int i = start; i < max; i++) {

Article article = (Article) hits.data(i);

String title = hits.highlighter(i).fragment("title");

if (title !=null) {

article.setTitle(title);

}

String content = hits.highlighter(i).setTextTokenizer(CompassHighlighter.TextTokenizer.AUTO).fragment("content");

if (content !=null) {

article.setContent(content);

}

result.add(article);

}

}

container.put("result", result);

return container;

}

});

}

public CompassTemplate getCompassTemplate() {

returncompassTemplate;

}

publicvoid setCompassTemplate(CompassTemplate compassTemplate) {

this.compassTemplate = compassTemplate;

}

}

publicclass MainTest {

publicstatic ClassPathXmlApplicationContextapplicationContext;

privatestatic HibernateTemplatehibernateTemplate;

@BeforeClass

publicstaticvoid init() {

System.out.println("sprint init...");

applicationContext =new ClassPathXmlApplicationContext("beans.xml");

hibernateTemplate =applicationContext.getBean(HibernateTemplate.class);

System.out.println("sprint ok");

}

@Test

publicvoid addData() {

System.out.println("addDate");

//把compass-beans.xml 中 bean id="compassIndexBuilder"

//buildIndex=true lazyTime=1

//会自动的根据数据库中的数据重新建立索引

try {

Thread.sleep(10000000);

} catch (InterruptedException e) {

e.printStackTrace();

}

}

@Test

publicvoid search() {

String keyword = "全文搜索引擎";

SearchServiceBean ssb = applicationContext.getBean(SearchServiceBean.class);

Map map = ssb.find(keyword,"article", 0, 100);//第一次搜索加载词库

long begin = System.currentTimeMillis();

map = ssb.find(keyword, "article", 0, 100);//第二次才是搜索用时

long end = System.currentTimeMillis();

System.out.println(String.format(

"搜索:[%s],耗时(ms):%d,记录数:%d", keyword, end-begin, map.get("size")));

List<Article> list = (List<Article>) map.get("result");

for(Article article : list) {

System.out.println(article);

}

}

4-4 说明

compass-beans.xml中可以设置建立索引的目录和分词器,测试的时候我们使用数据库添加数据,启动的建立索引,测试速度。

4-5 测试

使用mysql,写了一个添加数据的函数:

DELIMITER $$

CREATE

    FUNCTION `compass`.`addDateSyx`(num int(8))

    RETURNS varchar(32)

    BEGIN

declare i int(8);

set i = 0;

while ( i < num) DO

insert into _article (title,content, createTime) values (i, num-i, now());

set i = i + 1;

end while;

return "OK";

    END$$

DELIMITER ;

4-5-1 10000条重复的中文数据测试

数据库函数的时候修改下insert:

insert into _article (title,content, createTime) values ('用compass实现站内全文搜索引擎(一)', 'Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括: 

* 搜索引擎抽象层(使用Lucene搜索引荐),

* OSEM (Object/Search Engine Mapping) 支持,

* 事务管理,

* 类似于Google的简单关键字查询语言,

* 可扩展与模块化的框架,

* 简单的API.

如果你需要做站内搜索引擎,而且项目里用到了hibernate,那用compass是你的最佳选择。 ', now());

插入数据:

select addDateSyx1(10000);//hibernate 中的hibernate.hbm2ddl.auto=update

wps_clip_image-569wps_clip_image-11587

建立索引:

wps_clip_image-15051

wps_clip_image-4911

wps_clip_image-16445

10000条,8045ms,速度还不错。

索引大小:

wps_clip_image-10964

搜索:

wps_clip_image-6267

的确分词了,如果使用默认的分词,中文会每个中文分一个,速度比较快,如果使用JE-Anaylzer 116ms也是可以接受的。

4-5-2 10w条重复的中文数据测试

插入数据:

wps_clip_image-32560

Mysql 10w大约12s左右。

建立索引:

wps_clip_image-21575

wps_clip_image-12492索引大小和我想象的差不多,就是时间比我像的长多了,但我不想在试了。

搜索:

wps_clip_image-24973

10w的是数据,243ms还是很不错的,看来只要索引建好,搜索还是很方便的。

5 总结下吧

Compass用起来还是挺顺手的,应该基本需求可以满足的,不知道蛮好的项目怎么就不更新了,不然hibernate search就不会有的。

因为compass的不更新,所以lucene3.0以后的特性就不能用了,蛮可以的,虽然compass可以自动建索引(当然也可以手动CRUD),但如果封装下lucene来完成compass应该可以得到比较好的实现,期待同学们出手了。

参考文章:

compass实现站内全文搜索引擎(一)

再谈compass:集成站内搜索

compass快速给你的网站添加搜索功能

ITEYE上一篇也不错,不小心页面关了...

原创粉丝点击