Using Index Server to Search Your Web Site - Part II

来源：互联网发布：三十岁的女人知乎编辑：程序博客网时间：2024/04/28 16:49

Introduction

In the follow-up to our extremely popular Index Server article we cover linking to results, limiting the number of results, filtering results based on path and filename, and using GetRows and GetString. We also include an improved search script.

In this article I'm going to continue to discuss using Microsoft's Index Server from ASP. I'm assuming you've already read Part I which explains how to set up an index and the basic details of a "plain vanilla" search page. If you haven't, you might want to take a moment and give it a quick read before proceeding.

Linking to the Result Documents

For the most part... just finding out that a document exists is not enough. Once you perform a search and find a document you're interested in reading, the next step is usually to open that document and take a look at it. In a web-based setting, the key to doing this is creating a link to the document in question.

We've all created links... a simple <a href=""></a> should be all it takes, but filling in the URL part is what's important. With Index Server there are two different properties you can use to determine the path to the result document: path and vpath. Ideally (for web uses) you want to use vpath (virtual path to the document in question), but there's a catch. If you didn't associate the catalog in question with a WWW Server instance in the catalog configuration screen (which, as I already mentioned in Part I, has the downside of automatically including all the web's virtual folders and mucking up your catalog), the vpath property always comes back blank.

So in order to use vpath you'll need to go associate your index with the appropriate web server instance. If you choose this method I recommend that the very next thing you do is go to the Directories folder underneath your catalog in the Indexing Service configuration screen and set the "Include In Catalog" attribute of the ones you won't need to "No" and then restart the Indexing Service.

I personally prefer to have a "clean" index that only includes the documents I want indexed, so I decided to use the path property and create a function to translate the physical path into the appropriate form.

Function PathToVpath(strPath)Const strWebRoot = "c:/inetpub/wwwroot/"Dim strTempstrTemp = strPathstrTemp = Replace(strTemp, strWebRoot, "/")strTemp = Replace(strTemp, "/", "/")PathToVpath = strTempEnd Function

It's pretty basic... just swaps out the physical root path of the web site with a backslash and then swaps all backslashes for normal slashes. You'll need to set the Const to reflect the root of your site, but aside from that it's pretty simple and prevents me having to deal with Index Server's auto inclusion of virtual dirs.

Limiting the Number of Results Returned

Have you ever taken a look at the result of a query and seen something like: "Results 1 - 10 of about 61,300,000?" I don't care who you are or what you're looking for... no human being wants to look through a million or more results looking for something of interest. The fact that a search engine would even present this many entries is simply insane. Generally if you don't find what you're looking for in the first 10 or 20 results you're going to perform a new search to try and narrow down the results. So what's the point of your search page returning all the results?

It's fairly simple to tell Index Server to limit it's results to a reasonable number of hits. To do so you simply need to set the value of the MaxRecords property of the Query object to the maximum number of results you want your search to return. For example:

objQuery.MaxRecords = 50

Filtering Results Based on Filename

There are a number of reasons you might want to block results based on filename. Since Index Server is based on the file system, it will pick up files in the web that aren't linked to from any other pages. The two types of file that spring immediately to mind as ones that you might not want returned as search results are files generated by Frontpage Server Extensions and secure administration pages. Both play their role, but neither is something you want a random visitor stumbling across.

By adding a conditional to our query we can exclude files that we don't want returned as results. I'm not going to go into great depth as to the syntax since the documentation has gotten much better then it used to be, but the command below makes sure we only return files that don't contain "_vti_" in their path (FPSE files).

objQuery.Query = strQuery & " AND NOT #path */_vti_*"

As another example, if you wanted to exclude files whose name contained the word "admin," you could do something like this:

objQuery.Query = strQuery & " AND NOT #filename *admin*"

Finally there's no reason you can't do both:

objQuery.Query = strQuery _& " AND NOT #filename *admin*" _& " AND NOT #path */_vti_*"

Using GetRows and GetString

I never really though about it before, but after the last article I got an email from a user who was trying to use the GetString() method of the recordset returned from his query in order to get the results quickly and release the objects as quickly as possible (always a good idea). Unfortunately he was having some problems. I'm not sure what they were, but I recently tested these methods and they seem to work well. You'll need to be sure you're on an active record, but aside from that GetString() and GetRows() should work like they do for any other recordset. For more info you might want to take a look at our Database GetRows and Database GetString samples.

Searching Database Driven Web Sites

This is one section where Index Server really falls short of the mark. The only answer I've ever gotten for this is that you should build static pages for every possible page and then have Index Server index that. It's a half-@$$ solution and everyone knows it, but I've yet to find any other way to get Index Server to actually index data stored in a database. If you've had any luck in this area let me know and I'll be happy to share, but AFAIK, in order to get database query results into an Index Server query you'll need to query the DB yourself and find some (hopefully meaningful) way to integrate the results.

Have Patience

I didn't know where else to put this one, but I thought I should mention it. Often times time alone can fix a problem. I know it's frustrating, but there are times when Index Server just decides to take it's sweet time re-indexing. I've had experiences where I've spent half an hour trying to figure out what I was doing wrong because I kept getting no results back. I finally went to lunch to try and clear my head and when I got back everything was working fine. I'm not sure if the server was just too busy to re-index at that moment or if it was just trying to annoy me, but remember to give it some time... especially on large or complex sets of documents. (I haven't had this happen in a while so maybe faster computers are helping or maybe it was a bug in Index Server 2, but just FYI.)

Additional Information

While writing this article I ran across some relatively useful pages over at Microsoft. I wish I had found them when I was trying to build my first search page, but I guess it's better late then never:

Microsoft Platform SDK: Indexing Service 3.0 Start Page

Indexing Service 2.0 Technical Articles:

Anatomy of a Search Solution
Indexing with Microsoft Index Server
Introduction to Microsoft Index Server
Microsoft Index Server Tips and Tricks
Microsoft Index Server: Frequently Asked Questions
Using HTML Meta Properties with Microsoft Index Server

Notice that they're all pretty old and none of them fare very well based on user feedback, but at least they're something.

Downloads

I've added an advanced query page (integrating examples of the topics included above) to the download from Part I. If you just downloaded it then you've already got everything. If not you can get a copy from here: indexserver.zip (6.8KB).

The only things left to do (that I can think of) would be to make a more complex front end form to allow more complex queries (date and time, boolean, exact phrase, author, etc.) and to see if I can pull off getting this thing to work from ASP.NET. If I write a Part III, it'll be to cover one or both of those, but don't hold your breath.