Nighthawk Router Antenna Upgrade Experience

Upgraded my DD-WRT Nighthawk router with these antennas: http://www.amazon.com/gp/product/B00HMRJ8WK/ref=cm_cr_ryp_prd_ttl_sol_0

Here’s some stats on stock vs these antennas with my upgrade.
Wireless strength stock:
Screen Shot 2014-09-06 at 3.39.37 PM
Wireless strength with these antennas:
Screen Shot 2014-09-06 at 3.49.12 PM

Device ending in A6 is running on the 5Ghz band and is about 20 feet from the router. Device ending in B0 is on the 2.4Ghz band and is across the street in another building 150+ feet away. Although the results are not amazing, do remember that these percentages are on a logarithmic scale. So 4 or 5% is actually a decent increase. Additionally, although I don’t have exact statistics for it, throughput with these antennas did create a noticeable increase (using it as a 5Ghz router with a client on the 2.4Ghz band to bridge the network connections).

Additionally, I could likely increase TX broadcast strength of this router (currently running it at 251mW) and have some further gain.

8/10 do recommend the upgrade on the Nighthawk running DD-WRT.

Recommended Crawl Rate for Bots

You can set your desired bot crawl delay in your robots.txt file by adding this after the user-agent field: Crawl-Delay: 10

That will cause any legitimate robot to wait 10 seconds between requests as they crawl your site for links.
My recommendation, however, is not to set a crawl delay at all. You want bots like Googlebot and Bingbot to crawl your website as often as possible so your freshest content is in the search results. It’s only when you have an underpowered server with perhaps poorly written code that you want to add a crawl delay because in this case, you don’t want the bots to overwhelm your server with traffic causing it to crash. Googlebot, however, is pretty smart and if it notices increased response times due to the large amount of requests they are serving you, it will back off and make the requests more slowly. I’m unsure how Bingbot works with accidental DOS, but you can set your preferred crawl settings in Bing Webmaster Tools so Microsoft can focus their crawling on non-peak times to keep from overwhelming your server.

In terms of SEO, faster crawling is better, and quality new content is key.
Questions and experiences in the comments!
Cheers,
Luke

Google Apps script bot “GoogleApps script”

The GoogleApps Script Bot is a useragent / bot that Google’s Javascript app uses to fetch pages. Example and JS Code found below.
Here’s the standard apache log when this bot is accessing your site
64.233.172.162 - - [02/Jun/2014:15:12:59 -0400] "GET / HTTP/1.1" 200 749 "-" "Mozilla/5.0 (compatible; GoogleApps script; +http://script.google.com/bot.html)"
This bot can be used by any Google Docs user in order to scrape or otherwise access content on a website.
The particular IP accessing my site resolved to google-proxy-64-233-172-162.google.com
I’ve setup a test doc for anyone that would like to checkout the script in action. The code powering it can be found under the script editor.
You’re welcome to punch in your own url to have the crawler fetch your site.
See it in action: https://docs.google.com/a/rehmann.co/spreadsheets/d/1junWawm5kNziFJAZHdUP9wMpW9o-HHu3DGgQ0bLTyY4/edit#gid=0

Unfortunately, it looks like this script doesn’t follow rules set fourth in your robots.txt file, so if your website is being abused by a user using the Google Docs script bot, I would block the IP or setup your site to serve a 404 error to any user-agent matching “GoogleApps script”
I’d love to hear of your experience with this bot. Are people abusing it to scrape your content?

Cheers,
Luke

Functioning Example:
https://docs.google.com/a/rehmann.co/spreadsheets/d/1junWawm5kNziFJAZHdUP9wMpW9o-HHu3DGgQ0bLTyY4/edit#gid=0
(feel free to make a copy or test with your own URL)
Code behind it:
function readRows() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getDataRange();
var numRows = rows.getNumRows();
var values = rows.getValues();

for (var i = 0; i <= numRows - 1; i++) { var row = values[i]; Logger.log(row); } }; function GetPage(url) { var response = UrlFetchApp.fetch(url); return response.getContentText(); } function encodeURIC( r ) { if( r.constructor == Array ) { var out = r.slice(); for( i=0; i< r.length; i++){ for( j=0; j< r[i].length; j++){ out[i][j] = encodeURIComponent(r[i][j].toString() ) ; } } return out ; } else{ return encodeURIComponent(r.toString() ) } }

Google Custom Crawl Rate

I’d like to share my experience with Google’s crawl rate change feature under settings in Google Webmaster Tools.
It seems there is a consensus across the interwebs that this feature is only for slowing down google’s crawl rate on your servers. Let me show you my logs, which say quite differently.

Around May 14th, 2014, I released a new site of mine with several hundred thousand unique pages with decent content (sitemaps too).
Within 48 hours, Google was crawling all the URLs in my sitemap at a rate of about 1 per second.
On the 19th, unsatisfied with the crawl rate of just 1 per second and looking to improve it, I tweaked the settings in my GWT to “limit” the crawl to 3 requests per second. I received the following confirmation message:

We’ve received a request from a site owner to change the rate at which Googlebot crawls this site: http:// – .co/
New crawl rate: Custom rate
– 3.062 requests per second
– 0.327 seconds per request
Your request will take a day or two to come into effect.
This new crawl rate will stay in effect for 90 days.
We recommend that you set a custom crawl rate only if you’re experiencing traffic problems with your server. To use Google’s recommended crawl rate, follow these steps.
1. On the Dashboard, select the site you want.
2. Click Settings.
3. In the Crawl rate section, select “Let Google determine my crawl rate”.

On the evening of May 20th, Google bumped my crawl rate up to 3 requests per second.

Here is a snapshot of the logs over the past couple days to show the change. You’re welcome to draw conclusions yourself, and I’d be happy to hear of alternative reasons that google tripled my crawl rate.

web request log

This evening (May 20th), I made another change to increase the crawl rate yet again. We shall see if in 48 hours my crawl rate is bumped to 5 requests per second.

UPDATE Evening of May 21st: Just about 24 hours later, Google has once again bumped their crawl rate up to about 5 requests per second. I’m convinced that GWT’s crawl rate can be used to increase the crawl rate on your site. If you have content that google is interested in AND your server can handle the load, max out your crawl settings!

UPDATE 2: My experience with a couple other domains shows that it may take more than 24 hours (36-48 in some cases)
Cheers,
Luke

CrawlDaddy Crawler Bot?

Shortly after creating a new website & domain, the following requests from CrawlDaddy popped up in the logs:

64.202.161.41 - - [13/May/2014:10:54:37 -0400] "GET /index.php HTTP/1.1" 200 5789 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; CrawlDaddy v0.3.0 abot v1.2.0.0 http://code.google.com/p/abot)"

64.202.161.46 - - [13/May/2014:10:54:37 -0400] "GET /FAQ.php HTTP/1.1" 200 8391 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; CrawlDaddy v0.3.0 abot v1.2.0.0 http://code.google.com/p/abot)"

The bot requested 7 pages via the IPs of 64.202.161.41 and 64.202.161.46 and then finally made a header request of the homepage before exiting:

64.202.161.41 - - [13/May/2014:10:54:40 -0400] "HEAD / HTTP/1.1" 200 - "-" "-"

The pages all existed, it seemed the bot was crawling rather than checking for url-related vulnerabilities.
Based on the URL provided in the User-agent, the crawler seems to be based off some open source website crawler code project.
It’s a curious thing that the crawler is coming from Godaddy’s IP block (64.202.160.0/19) and that the bot did not request a robots.txt file..
A google search of CrawlDaddy didn’t reveal much information on this bot, I’d love to hear about your experiences with it in the comments.

Cheers,
Luke

CloudFlare Dynamic DNS

Here’s a simple Python script that can be called with your CloudFlare credentials to update the IP address with the address of the connecting machine.

Be sure to input your own information in the Email field, API key, base domain, record type, and record name. Any comments, I’d love to hear them.

I suggest running this script as a cron job set to run every 30 minutes or so, depending on your usage. Also, it doesn’t hurt to run it with the nice command!
[Download | GitHub Repository]

#Change the following

cloudflareEmail="[email protected]" #CF Login Email Address
cloudflareAPIkey="yourlonglowercaseandnumberapikey" #API Key as shown on https://www.cloudflare.com/my-account

#the following credentials will update the A record for example.thebasedomainname.com with the IP address of the connecting machine
baseDomain='thebasedomainname.com' #Domain Name as shown on https://www.cloudflare.com/my-websites.html
recordType='A' #See "Type" Column on https://www.cloudflare.com/dns-settings?z=example.com
recordName='example' #See "Name" Column on https://www.cloudflare.com/dns-settings?z=example.com

import os,re
import urllib
import sys

#Find the CloudFlare ID of your (sub)domain based on the recordName and recordType
data={'a': 'rec_load_all','tkn': cloudflareAPIkey,'email': cloudflareEmail,'z': baseDomain}
data = urllib.urlencode(data)
f = urllib.urlopen("https://www.cloudflare.com/api_json.html", data)
recloadall=f.read()
#print recloadall
recloadall=recloadall[0:recloadall.find('"display_name":"'+str(recordName)+'","type":"'+str(recordType)+'"')]
recordID=recloadall[recloadall.rfind("rec_id")+9:recloadall.find('","',recloadall.rfind("rec_id")+9)]
#print recordID
if recordID.find(':"error"')>-1:
recordID=recordID[recordID.find('"msg":"')+6:recordID.find(",",recordID.find('"msg":"')+3)]
print "CF Record:",recordID

#Get your current device IP Address
f = urllib.urlopen("http://ip-api.com/xml")
ipe=f.read()
ip=ipe[ipe.find("< ![CDATA[")+16:ipe.rfind("]]>",ipe.find("< ![CDATA["))].strip() print "IP Address:",ip #Update with Cloudflare data={'a': 'rec_edit','tkn': cloudflareAPIkey,'id': recordID,'email': cloudflareEmail,'z': baseDomain,'type': recordType,'name': recordName,'content': ip,'service_mode': '0','ttl': '1'} data = urllib.urlencode(data) f = urllib.urlopen("https://www.cloudflare.com/api_json.html", data) response=f.read() #print response print "Update:",response[response.find('result":')+9:response.find(',',response.find('result":')+3)-1]

Cheers!

Google Crawl Trends

Hello World!
I’m working on creating a tool that allows for better management and viewing of the data provided in Google’s Webmaster Tools. Specifically, I’m designing it to prevent the loss of all this data after 90 Days / 1 Year (depending on the statistic).
I was wondering if anyone has any features they would want in such a tool (e.g. crawl trend or traffic trend prediction) or if you wouldn’t mind sharing your google webmaster tools info with me via the add user feature- please contact me.
Cheers!

Python Code to Reset Linksys / OBI / Motorola Modem

The following code can be run as a cron job on your computer or, ideally, your linux based network attached storage

Upon loss of internet connection, it resets most motorola cable modems (common with Charter Communications) and also many Linksys / Cisco Routers. Optionally, it can be configured to reset an OBI VoIP device. If you have any questions or additions, feel free to comment or email me!

[download file, GitHub Repisotory ]

import os,re
import urllib
import urllib2
import sys
import time, base64

#This try/except block will first reset the Motorola Modem if the internet works.
#Be sure your modem page can be accessed from http://192.168.100.1 and reset from http://192.168.100.1/reset.htm 
#Only certain modems have this feature, so delete or comment out this block if it doesn't work for your model
try:
	urllib.urlopen("http://74.125.224.72")  #This attempts to access Google by their direct IP address
	print "Fetch Success, Internet Works! :)"
except:
	print "Fetch Error, Internet is currently down :("
	print "\tResetting Modem"
	urllib.urlopen("http://192.168.100.1/reset.htm")
	print "\tModem Reset, waiting 120 seconds to check again..."
	time.sleep(120)

#This block resets OBI Devices. Only uncomment if you are using a OBI VoIP device. Be sure you can access your device at http://192.168.10.1/ (if not, change it to the proper address)
#Enter your correct password in OBIPASSWORD
''' 
OBIPASSWORD="Your password here"
try:
	urllib.urlopen("http://74.125.224.72")
	print "Fetch Success, Internet Works!"
except:
	print "\tInternet Still Down, Resetting OBI"
	handler = urllib2.HTTPDigestAuthHandler()
	handler.add_password("[email protected]","http://192.168.10.1/rebootgetconfig.htm","admin", str(OBIPASSWORD))
	opener = urllib2.build_opener(handler)
	urllib2.install_opener(opener)
	opened=urllib2.urlopen("http://192.168.10.1/rebootgetconfig.htm")
	print "\tOBI Reset, waiting 120 seconds to restart router..."
	time.sleep(60)
'''

#This block resets many Cisco/Linksys routers via the Reboot button on the homepage
#be sure your router can be accessed at http://192.168.1.1 and input your credentials below. 
USERNAME="admin"
PASSWORD="admin" 
try:
	urllib.urlopen("http://74.125.224.72")
	print "Fetch Success, Internet Works!"
except:
	print "\tInternet Still Down, Resetting Router"
	req = urllib2.Request("http://192.168.1.1/apply.cgi")
	base64string = base64.encodestring('%s:%s' % (str(USERNAME), str(PASSWORD)))[:-1]
	authheader =  "Basic %s" % base64string
	req.add_header("Authorization", authheader)
	req.add_data("submit_button=index&change_action=gozila_cgi&submit_type=reboot&timer_interval=30")
	handle = urllib2.urlopen(req)

Cheers!