Google Custom Crawl Rate

I’d like to share my experience with Google’s crawl rate change feature under settings in Google Webmaster Tools.
It seems there is a consensus across the interwebs that this feature is only for slowing down google’s crawl rate on your servers. Let me show you my logs, which say quite differently.

Around May 14th, 2014, I released a new site of mine with several hundred thousand unique pages with decent content (sitemaps too).
Within 48 hours, Google was crawling all the URLs in my sitemap at a rate of about 1 per second.
On the 19th, unsatisfied with the crawl rate of just 1 per second and looking to improve it, I tweaked the settings in my GWT to “limit” the crawl to 3 requests per second. I received the following confirmation message:

We’ve received a request from a site owner to change the rate at which Googlebot crawls this site: http:// – .co/
New crawl rate: Custom rate
– 3.062 requests per second
– 0.327 seconds per request
Your request will take a day or two to come into effect.
This new crawl rate will stay in effect for 90 days.
We recommend that you set a custom crawl rate only if you’re experiencing traffic problems with your server. To use Google’s recommended crawl rate, follow these steps.
1. On the Dashboard, select the site you want.
2. Click Settings.
3. In the Crawl rate section, select “Let Google determine my crawl rate”.

On the evening of May 20th, Google bumped my crawl rate up to 3 requests per second.

Here is a snapshot of the logs over the past couple days to show the change. You’re welcome to draw conclusions yourself, and I’d be happy to hear of alternative reasons that google tripled my crawl rate.

web request log

This evening (May 20th), I made another change to increase the crawl rate yet again. We shall see if in 48 hours my crawl rate is bumped to 5 requests per second.

UPDATE Evening of May 21st: Just about 24 hours later, Google has once again bumped their crawl rate up to about 5 requests per second. I’m convinced that GWT’s crawl rate can be used to increase the crawl rate on your site. If you have content that google is interested in AND your server can handle the load, max out your crawl settings!

UPDATE 2: My experience with a couple other domains shows that it may take more than 24 hours (36-48 in some cases)
Cheers,
Luke

CrawlDaddy Crawler Bot?

Shortly after creating a new website & domain, the following requests from CrawlDaddy popped up in the logs:

64.202.161.41 - - [13/May/2014:10:54:37 -0400] "GET /index.php HTTP/1.1" 200 5789 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; CrawlDaddy v0.3.0 abot v1.2.0.0 http://code.google.com/p/abot)"

64.202.161.46 - - [13/May/2014:10:54:37 -0400] "GET /FAQ.php HTTP/1.1" 200 8391 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; CrawlDaddy v0.3.0 abot v1.2.0.0 http://code.google.com/p/abot)"

The bot requested 7 pages via the IPs of 64.202.161.41 and 64.202.161.46 and then finally made a header request of the homepage before exiting:

64.202.161.41 - - [13/May/2014:10:54:40 -0400] "HEAD / HTTP/1.1" 200 - "-" "-"

The pages all existed, it seemed the bot was crawling rather than checking for url-related vulnerabilities.
Based on the URL provided in the User-agent, the crawler seems to be based off some open source website crawler code project.
It’s a curious thing that the crawler is coming from Godaddy’s IP block (64.202.160.0/19) and that the bot did not request a robots.txt file..
A google search of CrawlDaddy didn’t reveal much information on this bot, I’d love to hear about your experiences with it in the comments.

Cheers,
Luke