Using SharePoint search in an ASP.NET application with noise filtering

Most content site owners want to be able to search the content in their sites. If you have access to MicrosoftSharePoint Portal Server 2003, you can use the advanced searching features to create a index of your site. The index is exposed via the SharePoint Query Service web service by submitting a specially crafted XML query to the service (this part, the syntax of the query, gave me the most trouble).

In this article, I’ll detail the steps and include some of the code used to get this working. I used this method to add search capabilities to a site implemented with Microsoft Content Server 2002. There are quite a few steps involved in implementing this process:

  1. Create a content index in SharePoint
  2. Create a content source in SharePoint
  3. Build a search submission page
  4. Implement pre-search logic (such as removing noise words from keyword searches)
  5. Compose logic to craft the search query 
  6. Submit the query
  7. Render the dataset returned to present the results

Create a content index in SharePoint

Go into the SharePoint Portal’s Site Settings and click Configure Search and Indexing. Now click the Add Content Index. The name you give the index is what you’ll use in the query.

Create a content source in SharePoint

Now that an index is created, you’ll need a content source to store the index of the website. Go back to the Configure Search and Indexing page and click Add content source. Select the name of the content index you created above, specify it’s a website, and hit next. Now, enter the URL of the site, a short description, the crawl configuration you desire, and select the source group you specified when you created the content index.

Now that’s finished, execute a full update of the website so SharePoint will start crawling and have it’s index populated by the time we’re ready to fire the search off.

Build a search submission page

Nothing fancy here, just a ASP.NET page with an input box for keywords and a submit button. I added some advanced searches that I use to further filter my search.

For example, in a MCMS site, you can have your url based off the site channel hierarchy. I used that information to filter products, services, news releases, or hit results from specific divisions. That’s something you need to determine based on your needs.

Add a web reference to your ASP.NET project pointing to the SharePoint Query Service. It can be found in the /_vti_bin/search.asmx off the root of the portal’s website.

Implement Pre-Search Logic

Wire up the click event to fire off some logic to trigger the search. Before crafting the XML query, you might want (I did) to clean out any noise files from the keywords submitted. This is actually a lot easier than you think. I took the english noise file from SharePoint (C:\Program Files\SharePoint Portal Server\DATA\Config\noiseeng.txt), modified it so every single word or letter (whatever made up a word) would be listed on a separate line. Then, I loaded the list of noise words into a ADO.NET datatable (one column… one record per word) and added it to the ASP.NET cache with a dependency on the noise file.

Now that I have a reference to the noise list, I need to get a list of all the keywords that were submitted. In my case, I only cared for alphanumeric characters, as well as the spaces the separated these words. After cleaning out the non alphanumeric characters, I split the resulting list into an array and checked each word to see if it was a noise word. If it is, I removed it from the array. At this point, I have an array of keywords without noise.

Compose logic to craft the search query

At this point, we have a clean list of keywords so I’m ready to create the XML search query. This query is sent to the SharePoint Query Service as an XML request. First thing is to take all the keywords and string them together into a clean T-SQL WHERE clause so I joined them together with AND’s. Now it’s time to build the XML query string… I used a string builder. The first code block shows the below contains the framework for the query. The second code block is where I built the query:

You’ll notice in the FROM part of the query I concatenated the scope. This is where you need to put your content index name. So if your index name was Marketing_Internet_Site, your FROM clause should be:
FROM Marketing_Internet_Site..SCOPE()

Here’s what the resulting XML should look like:

You may notice that my SELECT is also pretty slim. This is in part because all I need is the path to the page in MCMS so I didn’t want to grab more than necessary. You can see all the properties available to select from by going to “Manage Properties From Crawled Documents” in the SharePoint Portal Site Settings.

Submit the query

Now that the query is built, we just need to submit it (yes, the project switches to VB.NET… all the search logic is in the a business component which I wrote in C#):

You’ll notice on line 5 I pass 3 parameters. The first are all the keywords, the second contains the search scope, and the final is my object containing all the advanced search options they specified. You see how easy it is to submit the query in line 6.

Render the dataset returned to present the results

I’ll let you figure this out… it’s just a dataset after all.