Xapian学习笔记

来源:互联网 发布:签到送淘宝淘金币 编辑:程序博客网 时间:2024/06/05 06:26

接着上回的打分系统, 为了把该小程序加入到整个大系统中去,需要大系统提供接口,其实也就是提供http数据以及产生的SQL数据而已,由于数据是分开存放的,HTTP信息存放在一个文件夹中,SQL数据存放在一个文件夹中,并不像小程序那样是简单的放置在一个文件中,所以这样要涉及到搜索技术。

具体的设想如下,用户通过输入一个http请求,并不需要从http文件夹下所有文件中去查找数据,只需要去SQL文件夹下去查找即可,在查找过程中,由于知道了http的时间,故也只需要查询特定时间内产生的SQL语句,这里初步设定为http请求提交后的在十秒之内产生的SQL语句。通过对这些语句进行操作,得到一个得分情况。

关键是如何在如此众多的SQL语句中进行查找,这里用到一个新的工具,那就是Xapian,据说google也是用的和xapian类似的搜索技术,我们不需要它是如何实现的,只需要会用即可。

在http://xapian.org/ 上下载最新版后,解压有doc文件夹,里面有详细的使用说明可以看,如果只用到查找,那不需要看很多,也就是一个例子即可。

先给出代码:

/* quickstartsearch.cc: Simplest possible searcher * * ----START-LICENCE---- * Copyright 1999,2000,2001 BrightStation PLC * Copyright 2003,2004 Olly Betts * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 * USA */#include <xapian.h>#include <iostream>using namespace std;int main(int argc, char **argv){    // Simplest possible options parsing: we just require two or more    // parameters.    if (argc < 3) {        cout << "usage: " << argv[0] <<                " <path to database> <search terms>" << endl;        exit(1);    }    // Catch any Xapian::Error exceptions thrown    try {        // Make the databaseXapian::Database db(argv[1]);        // Start an enquire session        Xapian::Enquire enquire(db);        // Build the query object        Xapian::Query query(Xapian::Query::OP_OR, argv + 2, argv + argc);        cout << "Performing query `" << query.get_description() << "'" << endl;        // Give the query object to the enquire session        enquire.set_query(query);        // Get the top 10 results of the query        Xapian::MSet matches = enquire.get_mset(0, 10);        // Display the results        cout << matches.size() << " results found" << endl;        for (Xapian::MSetIterator i = matches.begin();             i != matches.end();             ++i) {            Xapian::Document doc = i.get_document();            cout << "Document ID " << *i << "\t" <<                    i.get_percent() << "% [" <<                    doc.get_data() << "]" << endl;        }    } catch(const Xapian::Error &error) {        cout << "Exception: "  << error.get_msg() << endl;    }}


 

在上述代码中,调用了xapian.h文件,用户传入的参数是关键字和目录 Xapian::Database db(argv[1])是打开一个数据库(如果之前没有它会自动创建一个),Xapian::Enquire enquire(db)是开启一个查询事件,Xapian::Query query是构建查询语句,enquire.set_query是进行查找,在查找完成是能给出匹配总数,结果所在的文件编号,符合度,查询数据结果。

当涉及到某一范围内查询时,xapian也能实现,说明文档如下

This class allows you to implement numeric range searches. The numbers used may be any number which is representable as a double, but requires that the stored values which the range is being applied have been converted to strings at index time using the Xapian::sortable_serialise() method:

Xapian::Document doc;doc.add_value(0, Xapian::sortable_serialise(price));

This method produces strings which will sort in numeric order, so you can use it if you want to be able to sort based on the value in numeric order, too.

The class allows a prefix or suffix to be specified which must be present on the values, allowing multiple NumberValueRangeProcessors to be active in the same queryparser. For example, this specifies that a prefix of "$" must be present on the first value (and may optionally be present on the second value):

Xapian::QueryParser qp;Xapian::NumberValueRangeProcessor numrange_proc(0, "$", true);qp.add_valuerangeprocessor(&numrange_proc);

}

 

只需要给一个范围即可,具体情况仍然在学习中,以后继续补充。

原创粉丝点击