全文检索的方式

来源：互联网发布：php网站开发实例编辑：程序博客网时间：2024/04/29 04:09

目前行业网站的全文检索的方式主要有两种

方式一:通过数据库自带的全文索引

方式二:通过程序来自建全文索引系统

以Sql Server 2005为例

2005本身就自带全文索引功能,你可以先对数据库表建立索引,具体如何建索引网上搜索一下,建立完索引之后,你就可以用SQL来实现检索功能,例如:select * from ytbxw where contaiins(字段,' 中国');多个查询值之间可以用and 或 or来实现,在单表以及单表视图上建全文索引对2005来说根本不是问题,但在多表视图建全文索引2005目前还无法实现这个功能,拿www.ytbxw.com为例,其每个栏目的信息都是分开存放的,所以在检索上就无法用该方法来解决这个问题.

下面重点说一下如何用程序来实现检索功能

如果你想自己开发一个全文检索系统,我想这是相当复杂事情,要想实现也不是那么容易的事情,所以在这里我推荐一套开源程序,那就是DotLucene,我想大家可能都听过这个东东吧,那我就讲讲如何来实现多表情况下的全文检索.

1、新建winform项目，把Lucene.Net.dll添加到该项目中来

2、创建一个类，类名可以自己取

public class Indexer
{
private IndexWriter writer;

　／／在指定路径下创建索引文件
  public Indexer(string directory)
  {
   writer = new IndexWriter(directory, new StandardAnalyzer(), true);
   writer.SetUseCompoundFile(true);
  }

　／／将信息添加到索引文件中

Field.Text:为索引＋读取

Field.UnIndexed:不需要做索引

*/
  public void AddHtmlDocument(string path,string title,string content)
  {
   Document doc = new Document();

            doc.Add(Field.Text("text", content));
            doc.Add(Field.UnIndexed("path", path));
            doc.Add(Field.Text("title", title));
   writer.AddDocument(doc);
  }

／／解析ＨＴＭＬ，过滤ＨＴＭＬ代码
  private string parseHtml(string html)
  {
   string temp = Regex.Replace(html, "<[^>]*>", "");
   return temp.Replace(" ", " ");
  }

／／从页面中获取文章标题
  private string getTitle(string html)
  {
   Match m = Regex.Match(html, "<title>(.*)</title>");
   if (m.Groups.Count == 2)
    return m.Groups[1].Value;
   return "(unknown)";
  }

        //添加新闻到索引
        public void AddNews()
        {

//从数据库获取记录（这部分略过）

            for (int i = 1; i <= PageSize; i++)
            {
                    RootId = int.Parse(dr["ClassId"].ToString().Substring(0, 2));

// 写入索引

AddHtmlDocument(http://www.ytbxw.com + dr["Id"].ToString() + ".html", dr["Title"].ToString(), parseHtml(dr["Content"].ToString()));

}/info/

        }
／／关闭索引
  public void Close()
  {
   writer.Optimize();
   writer.Close();
  }

}

以上就是建立索引的过程，在数据读取的时候最好分披读，比如一次３００条，这样效率会比较高．