Using Index Server to Search Your Web Site - Part I

来源:互联网 发布:三十岁的女人 知乎 编辑:程序博客网 时间:2024/04/28 02:07

Most Web Sites Need A Site Search

I'm not going to spend much time trying to convince you that your site needs a search, because if your site needs one then you probably already know it does. If you don't think you need a search then your site is probably either very small, is just a presence site, or deals only with very structured data and you may very well not need one. For everyone else, here's a quick rundown of one approach to adding search functionality to your site.

Search Options

There are a couple different ways to handle adding search functionality to your site. Each has their benefits and drawbacks and the option you choose is really dependent on your site and your seaching needs.

  • External Search Engine

    A number of the larger search engines offer site owners the option of adding a search box to your site that will enable users to search their search engine for results that reside on your site.

    This is perhaps the easiest way to get started and is a great option for users who host their site at an ISP or don't have access to their server in order to implement a more advanced solution. There are two main downsides to this approach. The first is that many engines can run into trouble indexing your site's content (especially if your site is database-driven) and therefore may not be able to return appropriate results. The second is that you often have very little control over the format of the results returned.

    Some search engines that offer this option (or can relatively easily be used this way) include: AlltheWeb, AltaVista, Google, HotBot, Lycos, MSN Search, and Yahoo. I couldn't find an option to do this for AOL Search, Ask Jeeves, Teoma, or WebCrawler but they may have them... I'm not that familiar with any of them.

    There are also a fair number of companies offering this as a paid solution. I'm not familiar with many of them, but if anyone is using one they really like let me know and I'll list it here for others to check out.

  • Server-Based Search Software

    If you can't get what you want via an external search it might be time to look into installing your own. The downsides here are that the implementation is usually much more complex and most of the products that fall into this category are not cheap. The main benefits include flexibility and customizability of your search, the ability to search private sites (aka. intranets), and the ability to search more then just HTML documents.

    This category contains all sorts of products from different vendors. Most of the major search engines have some sort of commercial software offering that you can install, but if you're looking to do it cheaply you can use Microsoft's Index Server. Index Server is included with Windows and is the focus of the rest of this article.

Index Server

Instead of takling the time to describe and explain Index Server and how it works, I'll let Microsoft's Index Server: Frequently Asked Questions do it for me so I can jump right into walking you through the setup.

Note: You'll probably notice that most of the Index Server documentation and content tends to be either out of date, incomplete, or both. I'm not sure why, but Index Server never gets explained very well by the folks in Redmond. I just mention it as a heads up... for those times like when it says you need to download and install IIS 3! Don't actually do it... If you're running Win2K or better you've probably already got it installed.

Creating an Index

Index Server is able to provide you with acceptable performance by indexing the content to be searched ahead of time and creating an index of keywords and document attributes. If it didn't, every time a query was issued the server would have to go looking through every document in order to find the ones that matched the query criteria which would result in a very slow search results and would probably cripple your server in the process.

Now since Index Server is not strictly a web-based search engine, it works by looking at the file system and not the links in your documents. This is important for a number of reasons, but primarily because it can have trouble giving you URLs for your documents and it will find and index documents that do not have links to them.

In general I find it best to create a new catalog when setting up a search for a given set of documents. For this article I've created some standard HTML documents (available in a zip file at the bottom of this page) and placed them in a directory named C:/Inetpub/wwwroot/indexserver/. This puts them right off my web server's root at http://localhost/indexserver/.

I'll be doing this in Windows 2000, but the process should be similar in other flavors/versions of NT-based OSs. To create a catalog, you need to go to the Administrative Tools folder in your web server's Control Panel. Under Administrative Tools you should find an item that says Computer Management. When you click on this it should open up MMC with the root node of Computer Management. Under that should be an item called Services and Aplications which contains an entry for Indexing Service.

Computer Management

If you right-click on Indexing Service and select New >, you should get the option to add a new catalog. It'll ask you for two things in order to create the catalog: the catalog name and a location to put it's files. For my example I'm using IS-Sample and C:/Inetpub/index/ but you can use whatever works for you. Note that this is not the location of the files to be indexed, but is where the files that Index Server creates and uses to do the indexing will be stored. Don't select a directory that is in use or one that is web-accessible. At this point, you'll probably get a message saying that the catalog will remain offline until the service is restarted. Don't worry... this is normal.

Add Catalog

You should now right-click on your new catalog (in the right hand pane) and select Properties.... You can't edit anything on the General tab so click to the Tracking tab. I find it best to leave everything unchecked and select (None) for WWW Server. Selecting a server is helpful because then index server can generate URLs for you, but it also causes it to automatically add and index all virtual directories under the selected web site. If you're on a server flavor of Windows and the web in question doesn't have any virtual directories then you can go ahead and select the appropriate web, but otherwise I'd recommend against it or you'll probably end up indexing a whole bunch of stuff you don't need or want indexed. Under the Generation tab I usually uncheck everything except Generate Abstracts. This creates little blurbs based on the documents and can be quite helpful. Pick something reasonable for the max size (I'm using 100 chars for the demo) and click OK.

The next step is to tell the catalog what to index. Right click on the Directories folder underneath your catalog and select New > Directory. This is where you add the directory you want this catalog to index. In my example it's C:/Inetpub/wwwroot/indexserver/.

Add Directory - Click for full size image

Next you click on Indexing Service in the left hand pane and click the start arrow in the toolbar to start the service if it's not already running. If it was running, stop and restart it. This should start all the catalogs indexing. You can stop any catalogs that are not in use by clicking them in the right hand pane and clicking the stop square in the toolbar. Now you've created a catalog and have told it what to index. Based on the size of the content involved, the initial indexing process can take a while so don't be worried if you don't get results right away or only get partial results. You can test your catalog by clicking on Query the Catalog underneath your catalog's name and running a test query. Using the word "database" with our sample documents I get three results: default.htm, database.htm, and a frontpage file (which we'll discuss more in part II).

So now our catalog is set up, but we still don't have any way to query it. Now we need to build a web form to interface with the Indexing Service.

Note that I also added caching of DocKeywords and DocAuthor fields under Properties under my catalog in order to get those fields working. Just check caching and accept the default values.

Search Page

Part of the power of Index Server is the fact that you can customize it to your heart's content. The flip side to that is that it can also be quite complex and is quite daunting to the new user. I'm going to start with a relatively simple search form and comment it heavily to get you started.

<%@ Language="VBScript" %><% Option Explicit %><html><head><title>ASP 101's Index Server Article - Search Page</title><meta name="description" content="Search Page"><meta name="keywords" content="Search Page"><meta name="author" content="John Peterson"></head><body><p>This is the search page of the sample web content for ASP 101'sIndex Server article.</p><form action="default.asp" method="get">    <input type="text" name="query" />    <input type="submit" value="Search" /></form><p>Queries that should return results include:<a href="?query=component">component</a>,<a href="?query=cookie">cookie</a>,<a href="?query=database">database</a>,<a href="?query=date">date</a>,<a href="?query=time">time</a>,<a href="?query=email">email</a>,<a href="?query=form">form</a>,<a href="?query=search">search</a>,etc.</p><%Dim strQuery   ' The text of our queryDim objQuery   ' The index server query objectDim rstResults ' A recordset of results returned from I.S.Dim objField   ' Field object for loop' Retreive the query from the querystringstrQuery = Request.QueryString("query")' If the query isn't blank then proceedIf strQuery <> "" Then    ' Create our index server object    Set objQuery = Server.CreateObject("ixsso.Query")    ' Set it's properties    With objQuery        .Catalog    = "IS-Sample"  ' Catalog to query        .Query      = strQuery     ' Query text        .MaxRecords = 10           ' Max # of records to return        ' What to sort records by.  I'm sorting by rank [d]        ' which is [d]escending by how pertinent Index Server        ' thinks the result is.  This way the most applicable        ' result should be first.        .SortBy = "rank [d]"        ' Which columns to return.  Column names must        ' be the same as the catalog's properties.  Some        ' of them are: contents, filename, size, path,        ' vpath, hitcount, rank, create, write, DocTitle        ' DocSubject, DocAuthor, DocKeywords...        .Columns = "filename, path, vpath, size, write, " _            & "characterization, DocTitle, DocAuthor, " _            & "DocKeywords, rank, hitcount"    End With    ' Get a recordset of our results back from Index Server    Set rstResults = objQuery.CreateRecordset("nonsequential")        ' Get rid of our Query object    Set objQuery = Nothing    ' Check for no records    If rstResults.EOF Then        Response.Write "Sorry. No results found."    Else        ' Print out # of results        Response.Write "<p><strong>"        Response.Write rstResults.RecordCount        Response.Write "</strong> results found:</p>"        ' Loop through results        Do While Not rstResults.EOF            ' Loop through Fields            ' Formatting leaves something to be desired,            ' but it'll work for now.  We'll pretty things            ' up and link to the content in part II.            For Each objField in rstResults.Fields                Response.Write "<strong>"                Response.Write objField.Name                Response.Write ":</strong> "                Response.Write rstResults.Fields(objField.Name)                Response.Write "<br />"            Next            ' Spacing between results            Response.Write "<br />"            ' Move to next result            rstResults.MoveNext        Loop    End If    ' Kill our recordset object    Set rstResults = NothingEnd If%></body></html>

In the next part we'll cover how to execute more advanced queries and get the results looking a little prettier and linking to the content pages, but this should get you started for now.

Downloads

You can download the sample content files and the search page listed above from here: indexserver.zip (6.8KB).

原创粉丝点击