/h/8913147.html in Google Analytics Spam

The page “/h/8913147.html” is part of a Google Analytics Spam campaign published by get-seo-help.com
It is likely the uniqueness of the url is utilized to avoid being filted by Google.

I’ve also seen the same html page being used for referrer spam from  free-seo-consultation.com

 

Push your pages to Baidu by adding this code

Baidu has a little snippet of code that will send a request to their server with the current page url in order to promote new page indexing

(function(){
    var bp = document.createElement('script');
    bp.src = '//push.zhanzhang.baidu.com/push.js';
    var s = document.getElementsByTagName("script")[0];
    s.parentNode.insertBefore(bp, s);
})();

Adding the above code to your site will cause a request to “http://api.share.baidu.com/s.gif?l=http://example.com” to be made everytime someone loads the page.

SEO for PDF Meta Tags [Case Study]

It’s great to optimize your pdf content for SEO, but what about PDF meta information? We are evaluating what meta information needs to be maintained for best PDF SEO performance with the attached document.

For this case study, we uploaded this pdf with unique tags in the Filename, Author, Description, Subject, Creator, About, and Producer meta tags.

PDF File Used: Rehmann-SEO-for-PDFs-BASIC9890

After the PDF file was indexed by Google, we searched for these unique tags. The only unique tag found was the filename tag.

From this small study, we conclude the only PDF tags looked at by Google are the Title tag and filename.

 

Google Search Console Crawl Rate Settings

Your site is being crawled at a rate calculated as optimal by Google. You can change the crawl rate only by filing a special request using the form mentioned in the “learn more” documentation.

For some sites, Google will allow users to adjust the crawl rate and thus manually limit the amount of traffic Google has on your servers. This customization option is only available to some sites. I’ve been unable to determine what metric limits users to adjusting this option. Sites with high and low traffic have the option both available and not available (replaced by the above message).

Google tries to crawl as many of your site’s pages as we can without overwhelming your server’s bandwidth. If Google’s crawlers are slowing your site, you can change the crawl rate (the speed of Google’s requests). This feature is only available for sites at the root or subdomain level.

If you think Googlebot is crawling your site too quickly, and you want to slow it down but cannot (because the WMT option is disabled), you can file a request here to report Googlebot issues with crawling. You’ll need to know the following information before you submit a crawl-issue request.

  1. From what IP addresses are you seeing Googlebot activity?
  2. For which user-agent are you seeing Googlebot activity?
  3. How many times a day does Googlebot access your site?
  4. Additional details (Please also include a portion of the weblog that shows Google accesses so we can track down the problem quickly):

Report a problem with how Googlebot crawls your site.
You can report problems only for domain-level properties (for example, “www.example.com/”)

The rate at which Google crawls your page depends on many factors:

  • The URLs we already know about
  • Links from other web pages (within your site and on other sites)
  • URLs listed in your Sitemap.

For most sites, Googlebot shouldn’t access your site more than once every few seconds on average. However, due to network delays, it’s possible that the rate will appear to be slightly higher over short periods. If you are seeing a particular issue with the Googlebot, please share it in the comments!

10 GB in a 27 KB Gzip File [My Present To HTTP Scanners]

Here’s a gzip bomb I use to redirect http scanners and web scrapers to:

10G.gz

Create a PHP file with the following:

< ?php header('Content-Encoding: gzip'); echo file_get_contents('10G.gz');

Example: http://rehmann.co/gz-bomb.php

How it works:

  1. A web-crawler or browser requests the page and sends the "accept-encoding: gzip, deflate, br" header.
    So long as gzip is accepted, the gzip bomb will do its job.
  2. The web server and php script respond to the request with the 27 KB Gzip bomb package. 27 KB is delivered to the client.
  3. The client browser or crawler begins to unzip the data before it is processed by the script or shown to the user
  4. The client machine runs out of memory / crashes before the bomb is fully unzipped.

A Base64 Encoded Image for Google Image Search Results

The following image is base64 encoded, I’m testing to see if it appears in google image search results. lukerehmannbase64

If you can reverse search the image by saving and uploading it to google image search, base64 image search is available in google. Otherwise, don’t expect your ibase64 images to be in the search results.

You have reached your submission limit using this tool. You can add more URLs using a sitemap. Monitor your site’s search traffic in Search Console.

If you hit Google’s webpage submission limit, you may receive the errors below. The only way around these limits is to:

  1. Wait 24 hours and submit again
  2. Use a different Google account.
You have reached your submission limit using this tool. You can add more URLs using a sitemap. Monitor your site’s search traffic in Search Console.

 

An error has occurred. Please try again later.

Google chose different canonical than user [Google Search Console / Webmaster Tools]

This URL is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. Because we consider this page a duplicate, we did not index it; only the canonical page is indexed. We recommend that you explicitly mark this page as a duplicate of the canonical URL. To learn which page is the canonical, click the table row to run an info: query for this URL, which should list its canonical page.

What to do: It’s best to follow Google’s advise on this one. Please the Google gods and set their version as the canonical url.

Submitted URL not selected as canonical [Google Search Console / Webmaster Tools]

The URL is one of a set of duplicate URLs without an explicitly marked canonical page. You explicitly asked this URL to be indexed, but because it is a duplicate, and Google thinks that another URL is a better candidate for canonical, Google did not index this URL. Instead, we indexed the canonical that we selected. The difference between this status and “Google chose different canonical than user” is that, in this case, you explicitly requested indexing.

What To Do: If you are properly using canonical tags, there is nothing to do so long as you are using canonical tags to tell Google what page to prefer. It may not be necessary to include non-canonical URLs in your sitemap, but I have found it helpful as Google will index some of these non-canonical URLs and display them in the search results.

My CloudFlare Argo Response Time Improvement

CloudFlare has added this nifty chart to give you an idea the performance boost their Argo Smart Networking feature adds (or takes away).

Unfortunately, the statistics are limited to the previous 4 hours, so I’ll try to post a couple separate charts.

Here’s one to start.

CloudFlare Argo offers 21% Improvement in routing time vs standard routing. Only about 20% of the traffic was SmartRouted
The data for this graph is on about 700 Megabytes / 100,000 Requests with about a 10% CloudFlare Cache Rate.

Above is a histogram of Time To First Byte (TTFB). The blue and orange series represent the before and after TTFB in locations, Argo found a Smart Route.
TTFB measures the delay between Cloudflare sending a request to your server and receiving the first byte in response. TTFB includes network transit time (which Smart Routing optimizes) and processing time on your server (which Argo has no effect on).

The geography of this first sample of Argo users was entirely limited to Moscow, Russia – suggesting over the past 48 hours, CloudFlare’s link to that side of the planet has performed faster. All the data originated from Google’s Northern California Data Center.

Site 2: This site is being served out of AWS East Data Center.

Sample Size: 150,000 Requests / 2 GB

Note a modest improvement in both China and Ireland.

Site 3: This site only had traffic of about 300 MB / 25,000 Requests over the past 48 hours so CloudFlare is unable to display performance data.
Argo Smart Routing is optimizing 12.0% of requests to your origin. There have not been enough requests to your origin in the last 48 hours to display detailed performance data.