Google Apps script bot “GoogleApps script”

The GoogleApps Script Bot is a useragent / bot that Google’s Javascript app uses to fetch pages. Example and JS Code found below.
Here’s the standard apache log when this bot is accessing your site - - [02/Jun/2014:15:12:59 -0400] "GET / HTTP/1.1" 200 749 "-" "Mozilla/5.0 (compatible; GoogleApps script; +"
This bot can be used by any Google Docs user in order to scrape or otherwise access content on a website.
The particular IP accessing my site resolved to
I’ve setup a test doc for anyone that would like to checkout the script in action. The code powering it can be found under the script editor.
You’re welcome to punch in your own url to have the crawler fetch your site.
See it in action:

Unfortunately, it looks like this script doesn’t follow rules set fourth in your robots.txt file, so if your website is being abused by a user using the Google Docs script bot, I would block the IP or setup your site to serve a 404 error to any user-agent matching “GoogleApps script”
I’d love to hear of your experience with this bot. Are people abusing it to scrape your content?


Functioning Example:
(feel free to make a copy or test with your own URL)
Code behind it:
function readRows() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getDataRange();
var numRows = rows.getNumRows();
var values = rows.getValues();

for (var i = 0; i <= numRows - 1; i++) { var row = values[i]; Logger.log(row); } }; function GetPage(url) { var response = UrlFetchApp.fetch(url); return response.getContentText(); } function encodeURIC( r ) { if( r.constructor == Array ) { var out = r.slice(); for( i=0; i< r.length; i++){ for( j=0; j< r[i].length; j++){ out[i][j] = encodeURIComponent(r[i][j].toString() ) ; } } return out ; } else{ return encodeURIComponent(r.toString() ) } }

Google Custom Crawl Rate

I’d like to share my experience with Google’s crawl rate change feature under settings in Google Webmaster Tools.
It seems there is a consensus across the interwebs that this feature is only for slowing down google’s crawl rate on your servers. Let me show you my logs, which say quite differently.

Around May 14th, 2014, I released a new site of mine with several hundred thousand unique pages with decent content (sitemaps too).
Within 48 hours, Google was crawling all the URLs in my sitemap at a rate of about 1 per second.
On the 19th, unsatisfied with the crawl rate of just 1 per second and looking to improve it, I tweaked the settings in my GWT to “limit” the crawl to 3 requests per second. I received the following confirmation message:

We’ve received a request from a site owner to change the rate at which Googlebot crawls this site: http:// – .co/
New crawl rate: Custom rate
– 3.062 requests per second
– 0.327 seconds per request
Your request will take a day or two to come into effect.
This new crawl rate will stay in effect for 90 days.
We recommend that you set a custom crawl rate only if you’re experiencing traffic problems with your server. To use Google’s recommended crawl rate, follow these steps.
1. On the Dashboard, select the site you want.
2. Click Settings.
3. In the Crawl rate section, select “Let Google determine my crawl rate”.

On the evening of May 20th, Google bumped my crawl rate up to 3 requests per second.

Here is a snapshot of the logs over the past couple days to show the change. You’re welcome to draw conclusions yourself, and I’d be happy to hear of alternative reasons that google tripled my crawl rate.

web request log

This evening (May 20th), I made another change to increase the crawl rate yet again. We shall see if in 48 hours my crawl rate is bumped to 5 requests per second.

UPDATE Evening of May 21st: Just about 24 hours later, Google has once again bumped their crawl rate up to about 5 requests per second. I’m convinced that GWT’s crawl rate can be used to increase the crawl rate on your site. If you have content that google is interested in AND your server can handle the load, max out your crawl settings!

UPDATE 2: My experience with a couple other domains shows that it may take more than 24 hours (36-48 in some cases)

CrawlDaddy Crawler Bot?

Shortly after creating a new website & domain, the following requests from CrawlDaddy popped up in the logs: - - [13/May/2014:10:54:37 -0400] "GET /index.php HTTP/1.1" 200 5789 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; CrawlDaddy v0.3.0 abot v1.2.0.0" - - [13/May/2014:10:54:37 -0400] "GET /FAQ.php HTTP/1.1" 200 8391 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; CrawlDaddy v0.3.0 abot v1.2.0.0"

The bot requested 7 pages via the IPs of and and then finally made a header request of the homepage before exiting: - - [13/May/2014:10:54:40 -0400] "HEAD / HTTP/1.1" 200 - "-" "-"

The pages all existed, it seemed the bot was crawling rather than checking for url-related vulnerabilities.
Based on the URL provided in the User-agent, the crawler seems to be based off some open source website crawler code project.
It’s a curious thing that the crawler is coming from Godaddy’s IP block ( and that the bot did not request a robots.txt file..
A google search of CrawlDaddy didn’t reveal much information on this bot, I’d love to hear about your experiences with it in the comments.


CloudFlare Dynamic DNS

Here’s a simple Python script that can be called with your CloudFlare credentials to update the IP address with the address of the connecting machine.

Be sure to input your own information in the Email field, API key, base domain, record type, and record name. Any comments, I’d love to hear them.

I suggest running this script as a cron job set to run every 30 minutes or so, depending on your usage. Also, it doesn’t hurt to run it with the nice command!
[Download | GitHub Repository]

#Change the following

cloudflareEmail="[email protected]" #CF Login Email Address
cloudflareAPIkey="yourlonglowercaseandnumberapikey" #API Key as shown on

#the following credentials will update the A record for with the IP address of the connecting machine
baseDomain='' #Domain Name as shown on
recordType='A' #See "Type" Column on
recordName='example' #See "Name" Column on

import os,re
import urllib
import sys

#Find the CloudFlare ID of your (sub)domain based on the recordName and recordType
data={'a': 'rec_load_all','tkn': cloudflareAPIkey,'email': cloudflareEmail,'z': baseDomain}
data = urllib.urlencode(data)
f = urllib.urlopen("", data)
#print recloadall
#print recordID
if recordID.find(':"error"')>-1:
print "CF Record:",recordID

#Get your current device IP Address
f = urllib.urlopen("")
ip=ipe[ipe.find("< ![CDATA[")+16:ipe.rfind("]]>",ipe.find("< ![CDATA["))].strip() print "IP Address:",ip #Update with Cloudflare data={'a': 'rec_edit','tkn': cloudflareAPIkey,'id': recordID,'email': cloudflareEmail,'z': baseDomain,'type': recordType,'name': recordName,'content': ip,'service_mode': '0','ttl': '1'} data = urllib.urlencode(data) f = urllib.urlopen("", data) #print response print "Update:",response[response.find('result":')+9:response.find(',',response.find('result":')+3)-1]


Google Crawl Trends

Hello World!
I’m working on creating a tool that allows for better management and viewing of the data provided in Google’s Webmaster Tools. Specifically, I’m designing it to prevent the loss of all this data after 90 Days / 1 Year (depending on the statistic).
I was wondering if anyone has any features they would want in such a tool (e.g. crawl trend or traffic trend prediction) or if you wouldn’t mind sharing your google webmaster tools info with me via the add user feature- please contact me.

Python Code to Reset Linksys / OBI / Motorola Modem

The following code can be run as a cron job on your computer or, ideally, your linux based network attached storage

Upon loss of internet connection, it resets most motorola cable modems (common with Charter Communications) and also many Linksys / Cisco Routers. Optionally, it can be configured to reset an OBI VoIP device. If you have any questions or additions, feel free to comment or email me!

[download file, GitHub Repisotory ]

import os,re
import urllib
import urllib2
import sys
import time, base64

#This try/except block will first reset the Motorola Modem if the internet works.
#Be sure your modem page can be accessed from and reset from 
#Only certain modems have this feature, so delete or comment out this block if it doesn't work for your model
	urllib.urlopen("")  #This attempts to access Google by their direct IP address
	print "Fetch Success, Internet Works! :)"
	print "Fetch Error, Internet is currently down :("
	print "\tResetting Modem"
	print "\tModem Reset, waiting 120 seconds to check again..."

#This block resets OBI Devices. Only uncomment if you are using a OBI VoIP device. Be sure you can access your device at (if not, change it to the proper address)
#Enter your correct password in OBIPASSWORD
OBIPASSWORD="Your password here"
	print "Fetch Success, Internet Works!"
	print "\tInternet Still Down, Resetting OBI"
	handler = urllib2.HTTPDigestAuthHandler()
	handler.add_password("[email protected]","","admin", str(OBIPASSWORD))
	opener = urllib2.build_opener(handler)
	print "\tOBI Reset, waiting 120 seconds to restart router..."

#This block resets many Cisco/Linksys routers via the Reboot button on the homepage
#be sure your router can be accessed at and input your credentials below. 
	print "Fetch Success, Internet Works!"
	print "\tInternet Still Down, Resetting Router"
	req = urllib2.Request("")
	base64string = base64.encodestring('%s:%s' % (str(USERNAME), str(PASSWORD)))[:-1]
	authheader =  "Basic %s" % base64string
	req.add_header("Authorization", authheader)
	handle = urllib2.urlopen(req)