A crawler impact rule defines the rate at which the Windows SharePoint Services Help Search service requests documents from a Web site during crawling. The rate can be defined either as the number of simultaneous documents requested or as the delay between requests. In the absence of a crawler impact rule, the number of documents requested is from 5 through 16 depending on the hardware resources.
You can use crawler impact rules to modify loads placed on sites when you crawl them.
Crawl rules provide you with the ability to set the behavior of the Enterprise Search index engine when you want to crawl content from a particular path. By using these rules, you can:
- Prevent content within a particular path from being crawled.
For example, in a scenario in which a content source points to the URL path such ashttp://www.test.com/, but you want to prevent content from the "downloads" subdirectoryhttp://www.test.com/downloads/ from being crawled, you would set up a rule for the URL, with the behavior set to exclude content from that subdirectory.
- Indicate that a particular path that would otherwise be excluded from the crawl should be crawled.
Using the previous scenario, if the downloads directory contained a directory called "content" that should be included in the crawl, you would create a crawl rule for the following URL, with the behavior set to include the "content" subdirectory http://www.test.com/downloads/content.
Crawl rules define the "what". You can include/exclude the content you want to crawl. What urls, pages, documents, images etc.
Crawler impact rules defined the "how" or "when". You can control how often the crawl service will request content. So for example, if you only have one server, you could defined impact rules to reduce load on your servers. If you have dedicated crawl/index servers you can request content more frequently instead.
Comments
Post a Comment