XAPIAN学习1--倒排数据库 建立,工厂模式应用
来源:互联网 发布:浙江企业数据平台 编辑:程序博客网 时间:2024/05/22 01:36
选择xapian是因为xapian的代码可读性更强,更接近现代C++风格,用户端代码很少需要直接和指针打交道而看了下lemur示例代码到处都是指针不太爽,呵呵,还是了解不多吧,总之从xapian开始吧。
xapian的索引建立
xapian提供多种不同的索引方式比如in memory, flnt格式的等待(具体没有仔细看)。
在xapian中提供一个基类DataBase表示索引数据库,WriteableDataBase继承自DataBase.
XAPIAN为了能生成多种不同格式的数据库,采用了工厂模式类似
Base* data_base = new FlntDataBase()Base* data_base = new InMemroyBase().csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
基类指针指向不同的子类对象。但是在XAPIAN的客户端代码中无需和指针打交道(否则你还需要用智能指针或者需要手动delete)看下客户端代码示例。
Xapian::WritableDatabase m_database(Xapian::InMemory::open());
.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }这样就生成了一个基于内存的数据库。使用的时候 m_database.add_document 就是采用内存数据库的添加文档实现。
那么xapian如何内部实现这些的呢,方法就是xapian代码中大量采用了effective c++中提到的piml设计方法达到接口实现分离。不清楚的的可以参考这篇文章http://www.gamedev.net/reference/articles/article1794.asp (The C++ Pimpl)
看一下WriteableDataBase的源码大概如下(注意做了简化只为示意)
class Database { public: class Internal; std::vector<Xapian::Internal::RefCntPtr<Internal> > internal; //实现被隐藏由internal实现 //xapian自己写了个引用计数的指针类,其实用boost shared_ptr或者c++0x的即可
Xapian::docid add_document(const Xapian::Document & document);
//…}
//注意对应下面inmemory::open代码,这里有一个基于 Internal*输入的的构造函数 Database::Database(Database::Internal *internal_) { Xapian::Internal::RefCntPtr<Database::Internal> newi(internal_); internal.push_back(newi);}
//拷贝构造函数只是拷贝指针。。 资源共享而不是复制Database::Database(const Database &other){ internal = other.internal;}
//这里继承RefCntBase是一种典型的引用计数实现方式,和more effective c++中介绍的完全一样//如果我们用boost shared_ptr 应该就不需要这种继承了。。。关于智能指针感觉最好的一个介绍http://www.codeproject.com/KB/stl/boostsmartptr.aspx
//关于智能指针实现的介绍http://www.mpi-inf.mpg.de/~kettner/courses/lib_design_03/notes/advanced.html
//xapian 采用自己写的这种是侵入式的实现方式代价小一点,可复用性差一点class Database::Internal : public Xapian::Internal::RefCntBase { private: /// Copies are not allowed. Internal(const Internal &); /// Assignment is not allowed. void operator=(const Internal &); public: virtual Xapian::docid add_document(const Xapian::Document & document) = 0;};.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
Xapian::docidWritableDatabase::add_document(const Document & document){ //具体都由internal 完成,可以看出其实vector它要求size为1其实一个shared_ptr就够用了可能是为了以后可能的扩展 if (internal.size() != 1) only_one_subdatabase_allowed(); RETURN(internal[0]->add_document(document));}.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
class WritableDatabase : public Database {}WritableDatabase::WritableDatabase(Database::Internal *internal_) : Database(internal_){}WritableDatabase::WritableDatabase(const WritableDatabase &other) : Database(other){}在dbfactory.h中声明了inmemory::open 方法,注意返回的是WritebleDatabase
namespace InMemory {WritableDatabase open();}
.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
它的实现
WritableDatabaseInMemory::open() { return WritableDatabase(new InMemoryDatabase);
//Database::Database(Database::Internal *internal_) }
OK这个就是生成代码,其实就是加了一层的包装,我们所有的操作都由internal来负责完成,不同的internal实现比如in memory类型或者其它类型都是继承Xapian::Database::Internal
class InMemoryDatabase : public Xapian::Database::Internal { Xapian::docid add_document(const Xapian::Document & document); //inmemory类型的database具体实现 //.....
private:// Stop copy / assignment being allowed InMemoryDatabase& operator=(const InMemoryDatabase &); InMemoryDatabase(const InMemoryDatabase &);.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
}
Xapian::docidInMemoryDatabase::add_document(const Xapian::Document & document){ if (closed) InMemoryDatabase::throw_database_closed(); Xapian::docid did = make_doc(document.get_data()); finish_add_doc(did, document); RETURN(did);}.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, "Courier New", courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
- XAPIAN学习1--倒排数据库 建立,工厂模式应用
- 倒排索引的建立
- MR倒排索引建立
- 倒排文件的应用
- 1:倒排索引
- jieba分词python建立倒排索引
- Mapreduce实现倒排索引建立
- hadoop学习-倒排索引
- hadoop学习-倒排索引
- hadoop倒排索引---学习
- 倒排索引学习笔记
- Xapian学习笔记 1 介绍
- Hadoop平台倒排索引应用举例
- 学习xapian
- 工厂模式在数据库访问中的应用
- Lucene的倒排索引(学习整理)
- hadoop 学习笔记之倒排索引
- hadoop学习笔记(倒排索引)
- Ubuntu 下查看系统版本号
- 开源授权协议GPL LGPL BSD介绍
- Ubuntu 下比较好的chm阅读软件
- 2011.7.11
- Python 之禅,by Tim Peters
- XAPIAN学习1--倒排数据库 建立,工厂模式应用
- LIstView与Adapter的配合使用
- gethostbyname()函数说明 ---用域名或主机名获取IP地址
- SimpleAdapter与ListView的使用
- JAVA开发环境之linux和window比较(转自javaeye)
- 字符编码
- 微软不厚道,哈哈
- CdateTimeCtrl
- Linux文件管理常用命令