June 18, 2012

SharePoint Search: Partial Page Exclusion

Out of the box, SharePoint provides a mechanism for excluding publishing pages from the search index by any number of criteria, but what if you want to exclude only parts of a page? This becomes useful when you have content on numerous pages that contains common keywords.
For Example, In Our Production site we have master page and in this master page , Header , Left Navigation and footer for the Pages.
So when someone performs a search for "Careers" they will get back every page in your site, instead of just the Careers and related Careers pages. To prevent this, we needed a way to keep SharePoint from indexing that content when it's performing a crawl.
Web Control
We decided that a System.Web.UI.WebControls.Panel control would be a good model to build my control on. It allows you to easily drop it in the page layout using SharePoint Designer, and you can put other html and controls within it. I didn't want to inherit from the Panel control though, because it adds unwanted 'div' tags to the rendered output. The key to the Panel control's behavior are the following two attributes on the class:
[ParseChildren(false), PersistChildren(true)].
These attributes allow the content within the control to persist as controls and not properties of this control.
User Agent
The second part of the equation is knowing when to show or hide the contents of the web control.
SharePoint gives us a way to identify that it's performing a crawl through the UserAgent property of the http request by adding "MS Search" to it.
Putting this all together we come up with the following class:

[ParseChildren(false), PersistChildren(true)]
    public class SearchCrawlExclusionControl : WebControl
        private string userAgentToExclude;

        public string UserAgentToExclude
                return (string.IsNullOrEmpty(userAgentToExclude)) ? "ms search" : userAgentToExclude;
                userAgentToExclude = value;

        protected override void CreateChildControls()
            string userAgent = this.Context.Request.UserAgent;

            this.Visible = (!string.IsNullOrEmpty(userAgent)) ? !userAgent.ToLower().Contains(UserAgentToExclude) : true;


Using It
Register Web Control within Page.
<%@ Register Tagprefix="SearchUtil" Namespace="ABC.SharePoint.WebControls" Assembly="ABC.SharePoint, Version=, Culture=neutral, PublicKeyToken=xxxxxx" %>
After adding the register tag to the page layout, we can wrap all the content we want to exclude with our control:
<SearchUtil:SearchCrawlExclusionControl ID="SearchCrawlExclusionControl1" runat="server">
    <div>Some Content To Excludediv>
Test this:
After applying this User control to all your excluding Div and HTML tags.
Make Incremental or Full crawl of you web site.
How to edit the User Agent string in Mozilla FireFox
To change the User Agent string, just enter about:config as an address in the address bar of FireFox, the location where you normally enter a URL (link). I recommend to preserve the original value, which you can get when you enter just about: in the address bar.
Now press the right mouse button to get the context menu and select "String" from the menu entry "New". Enter the preference name "general.useragent.override", without the quotes. Next, enter the new User Agent value you want Mozilla Firefox to use.
I added my name and a link to my web site to the original value. You can also pick one from the list of User Agent strings. Check the new value by entering about: in the address bar.
How to edit the User Agent string in Google Chrome
Here's how to change the user agent:
  • open the Developer Tools (Ctrl+Shift+I on Windows/Linux, Command - Option - I on Mac OS X)
  • 2. click the "settings" icon at the bottom of the window
  • 3. check "override user agent" and select one of the options (Internet Explorer 7/8/9, Firefox 4/7 for Windows/Mac, iPhone, iPad and Nexus S running Android 2.3).

You can also select "other" and enter a custom user agent.
Note: Here to test for SharePoint search write MS Search as User Agent in Other option.
How to edit the User Agent string in Internet Explorer
To change it open your registry and find the key
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\User Agent\Post Platform].
Each value name listed in this key will be sent to the remote web server as an additional entry in the user agent string. To remove any additional information delete the values within the [Post Platform] key. To add additional entries create a string value and name it the string you want to be sent.
Restart Internet Explorer for the changes to take effect.
Note:Here to test for SharePoint search write MS Search as User Agent in Other option.
