Implementing Full Text Indexing with PostgreSQL

来源:互联网 发布:html5动画软件 编辑:程序博客网 时间:2024/04/29 20:08
<script type="text/javascript"><!--google_ad_client = "pub-2947489232296736";/* 728x15, 创建于 08-4-23MSDN */google_ad_slot = "3624277373";google_ad_width = 728;google_ad_height = 15;//--></script><script type="text/javascript"src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
<script type="text/javascript"><!--google_ad_client = "pub-2947489232296736";/* 160x600, 创建于 08-4-23MSDN */google_ad_slot = "4367022601";google_ad_width = 160;google_ad_height = 600;//--></script><script type="text/javascript"src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>

  ike many companies, over the years my company developed a home-grown billing/project system that allows our programmers to bill to a Web application. This system is quite capable; however one of the drawbacks to the system is that there's no way to search the billable entries or customer notes.

  Of course, we could use simple LIKE or regex queries via SQL but that would be resource-intensive and would not provide the detailed results we would like. For example, to search for all billing entries that had a description with the words 'apache' (to protect the customer name) and 'php' we could write a query like this:

  SELECT description

  FROM cmd_contact_task_entries

  WHERE description ~ 'php'

  AND description ~ 'apache';

 

  author's note: the tilde () in the query represents the regular expression operator in PostgreSQL.

 

 

  In our system the preceding query would return seven rows, but it would not rank the matching documents. Moreover, if we were looking for more detailed information we would have to create some fairly onerous queries. What this system really needs is Full Text Indexing.

  Although PostgreSQL is, in my opinion, the most capable open source database available, it does have its drawbacks. One such drawback is that PostgreSQL doesn't include Full Text Indexing as part of the core distribution. However, as with any good open source project, the PostgreSQL community has contributed many useful modules—and one such module is TSearch2.

  TSearch2 is a maintained contrib module for both PostgreSQL 7.3 and 7.4. In short, TSearch2 is a new indexable data type that stores the position of words within a particular document—thus making the retrieval of those documents based on specified parameters both efficient and reliable.

<script type="text/javascript"><!--google_ad_client = "pub-2947489232296736";/* 728x15, 创建于 08-4-23MSDN */google_ad_slot = "3624277373";google_ad_width = 728;google_ad_height = 15;//--></script><script type="text/javascript"src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
<script type="text/javascript"><!--google_ad_client = "pub-2947489232296736";/* 160x600, 创建于 08-4-23MSDN */google_ad_slot = "4367022601";google_ad_width = 160;google_ad_height = 600;//--></script><script type="text/javascript"src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
原创粉丝点击