Skip to main content

Differences between Crawler impact rules and crawl rules in SharePoint 2010 search

A crawler impact rule defines the rate at which the Windows SharePoint Services Help Search service requests documents from a Web site during crawling. The rate can be defined either as the number of simultaneous documents requested or as the delay between requests. In the absence of a crawler impact rule, the number of documents requested is from 5 through 16 depending on the hardware resources.

You can use crawler impact rules to modify loads placed on sites when you crawl them.

Crawl rules provide you with the ability to set the behavior of the Enterprise Search index engine when you want to crawl content from a particular path. By using these rules, you can:

  • Prevent content within a particular path from being crawled.

For example, in a scenario in which a content source points to the URL path such ashttp://www.test.com/, but you want to prevent content from the "downloads" subdirectoryhttp://www.test.com/downloads/ from being crawled, you would set up a rule for the URL, with the behavior set to exclude content from that subdirectory.

  • Indicate that a particular path that would otherwise be excluded from the crawl should be crawled.

Using the previous scenario, if the downloads directory contained a directory called "content" that should be included in the crawl, you would create a crawl rule for the following URL, with the behavior set to include the "content" subdirectory http://www.test.com/downloads/content.


Crawl rules define the "what". You can include/exclude the content you want to crawl. What urls, pages, documents, images etc.

Crawler impact rules defined the "how" or "when". You can control how often the crawl service will request content. So for example, if you only have one server, you could defined impact rules to reduce load on your servers. If you have dedicated crawl/index servers you can request content more frequently instead.

Comments

Popular posts from this blog

How to get SPUser or SPGroup from Person or Group field

You have person or group field in SharePoint list and you want to programmatically get the user or person. The below code to gets SPUser from User or Group field in the list when multiple choice and Groups are not allowed in the field: //get SPUser SPFieldUser userField = (SPFieldUser)item.Fields.GetField("Users"); SPFieldUserValue userFieldValue = (SPFieldUserValue)userField.GetFieldValue(item["Users"].ToString()); SPUser user = userFieldValue.User; This part of code would help you to get SPUser when multiple choice is allowed and groups are not allowed: //Multiple choices are allowed SPFieldUser userField = (SPFieldUser)item.Fields.GetField("Users"); SPFieldUserValueCollection userFieldValueCollection = (SPFieldUserValueCollection)userField.GetFieldValue(item["Users"].ToString()); foreach (SPFieldUserValue userFieldValue in userFieldValueCollection) {     Console.WriteLine("     " + userFieldValue.User.LoginName); } And when group

SharePoint publishing page scheduling

In SharePoint 2010 publishing enabled team site collection is not showing schedule button in publish ribbon. Solution: Here is how to enable it  though the UI: Locate the SharePoint Server Web site on which you want to enable content approval and item scheduling. Click  Site Actions , click  Site Settings , and then click  Modify Pages Library Settings . Under  General Settings , click  Versioning Settings . Click  Yes  next to  Content Approval , and then click  OK . Click  Manage item scheduling .   Click to enable the  Enable scheduling of items in this list  check box, and then click  OK .

Changing Content Type Hub URL

Change the Hub URL through powershell by using: Set-SPMetadataServiceApplication -Identity " " -HubURI " " For Ex: Set-SPMetadataServiceApplication -Identity "Managed Metadata Service" -HubURI "http://contenttype.Domain.Com" When you try to do this you get this rather scary message: This is SharePoint telling you that this is a major thing so be careful! Essentially all your content types that you have published out will be removed if they can, and you have to republish all of your content types out again which can cause some issue.