Google chose different canonical than user [Google Search Console / Webmaster Tools]

This URL is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. Because we consider this page a duplicate, we did not index it; only the canonical page is indexed. We recommend that you explicitly mark this page as a duplicate of the canonical URL. To learn which page is the canonical, click the table row to run an info: query for this URL, which should list its canonical page.

What to do: It’s best to follow Google’s advise on this one. Please the Google gods and set their version as the canonical url.

Submitted URL not selected as canonical [Google Search Console / Webmaster Tools]

The URL is one of a set of duplicate URLs without an explicitly marked canonical page. You explicitly asked this URL to be indexed, but because it is a duplicate, and Google thinks that another URL is a better candidate for canonical, Google did not index this URL. Instead, we indexed the canonical that we selected. The difference between this status and “Google chose different canonical than user” is that, in this case, you explicitly requested indexing.

What To Do: If you are properly using canonical tags, there is nothing to do so long as you are using canonical tags to tell Google what page to prefer. It may not be necessary to include non-canonical URLs in your sitemap, but I have found it helpful as Google will index some of these non-canonical URLs and display them in the search results.

Google Search Console Definitions

Google Webmaster Tools / Search Console Definitions

Errors:

Server error (5xx): Your server returned a 500-level error when the page was requested.

Redirect error: The URL was a redirect error. Could be one of the following types: it was a redirect chain that was too long; it was a redirect loop; the redirect URL eventually exceeded the max URL length; there was a bad or empty URL in the redirect chain.

Submitted URL blocked by robots.txt: You submitted this page for indexing, but the page is blocked by robots.txt. Try testing your page using the robots.txt tester.

Submitted URL marked ‘noindex’: You submitted this page for indexing, but the page has a ‘noindex’ directive either in a meta tag or HTTP response. If you want this page to be indexed, you must remove the tag or HTTP response.

Submitted URL seems to be a Soft 404: You submitted this page for indexing, but the server returned what seems to be a soft 404.

Submitted URL returns unauthorized request (401):You submitted this page for indexing, but Google got a 401 (not authorized) response. Either remove authorization requirements for this page, or else allow Googlebot to access your pages by verifying its identity.

Submitted URL not found (404): You submitted a non-existent URL for indexing.

Submitted URL has crawl issue: You submitted this page for indexing, and Google encountered an unspecified crawling error that doesn’t fall into any of the other reasons. Try debugging your page using Fetch as Google.

Warnings:

Indexed, though blocked by robots.txt: The page was indexed, despite being blocked by robots.txt (Google always respects robots.txt, but this doesn’t help if someone else links to it). This is marked as a warning because we’re not sure if you intended to block the page from search results. If you do want to block this page,robots.txt is not the correct mechanism to avoid being indexed. To avoid being indexed you should either use ‘noindex’ or prohibit anonymous access to the page using auth. You can use the robots.txt tester to determine which rule is blocking this page. Because of the robots.txt, any snippet shown for the page will probably be sub-optimal. If you do not want to block this page,update your robots.txt file to unblock your page.

Valid:

Submitted and indexed: You submitted the URL for indexing, and it was indexed.

Indexed, not submitted in sitemap: The URL was discovered by Google and indexed. We recommend submitting all important URLs using a sitemap.

Indexed; consider marking as canonical: The URL was indexed. Because it has duplicate URLs, we recommend explicitly marking this URL as canonical.

 Excluded:

Blocked by ‘noindex’ tag: When Google tried to index the page it encountered a ‘noindex’ directive, and therefore did not index it. If you do not want the page indexed, you have done so correctly. If you do want this page to be indexed, you should remove that ‘noindex’ directive.

Blocked by page removal tool: The page is currently blocked by a URL removal request. Removal requests are only good for a specified period of time (see the linked documentation). After that period, Googlebot may go back and index the page, even if you do not submit another index request. If you do not want the page to be indexed, use ‘noindex’, require authorization for the page, or remove the page. If you are a verified site owner, you can use the URL removals tool to see who submitted a URL removal request.

Blocked by robots.txt: This page was blocked to Googlebot with a robots.txt file. You can verify this using the robots.txt testerNote that this does not mean that the page won’t be indexed through some other means. If Google can find other information about this page without loading it, the page could still be indexed (though this is less common). To ensure that a page is not indexed by Google, remove the robots.txt block and use a ‘noindex’ directive.

Blocked due to unauthorized request (401): The page was blocked to Googlebot by a request for authorization (401 response). If you do want Googlebot to be able to crawl this page, either remove authorization requirements, or allow Googlebot to access your pages by verifying its identity.

Crawl anomaly: An unspecified anomaly occurred when fetching this URL. This could mean a 4xx- or 5xx-level response code; try fetching the page using Fetch as Google to see if it encounters any fetch issues. The page was not indexed.

Crawled – currently not indexed: The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

Discovered – currently not indexed: The page was found by Google, but not crawled yet.

Alternate page with proper canonical tag: This page is a duplicate of a page that Google recognizes as canonical, and it correctly points to that canonical page, so nothing for you to do here!

Duplicate page without canonical tag: This page has duplicates, none of which is marked canonical. We think this page is not the canonical one. You should explicitly mark the canonical for this page. To learn which page is the canonical, click the table row to run an info: query for this URL, which should list its canonical page.

Duplicate non-HTML page: A non-HTML page (for example, a PDF file) is a duplicate of another page that Google has marked as canonical. Typically only the canonical URL will be shown in Google Search. If you like, you can specify a canonical page using the Link HTTP header in a response.

Google chose different canonical than user: This URL is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. Because we consider this page a duplicate, we did not index it; only the canonical page is indexed. We recommend that you explicitly mark this page as a duplicate of the canonical URL. To learn which page is the canonical, click the table row to run an info: query for this URL, which should list its canonical page.

Not found (404): This page returned a 404 error when requested. The URL was discovered by Google without any explicit request to be crawled. Google could have learned of the URL through different ways: for example, another page links to it, or it existed previously and was deleted. Googlebot will probably continue to try this URL for some period of time; there is no way to tell Googlebot to permanently forget a URL, although it will crawl it less and less often. 404 responses are not a problem, if intentional. If your page has moved, use a 301 redirect to the new location. Read here to learn more about how to think about 404 errors on your site.

Page removed because of legal complaint: The page was removed from the index because of a legal complaint.

Page with redirect: The URL is a redirect, and therefore was not added to the index.

Queued for crawling: The page is in the crawling queue; check back in a few days to see if it has been crawled.

Soft 404: The page request returns what we think is a soft 404 response. This means that it returns a user-friendly “not found” message without a corresponding 404 response code. We recommend returning a 404 response code for “not found” pages to prevent indexing of the page.

Submitted URL dropped: You submitted this page for indexing, but it was dropped from the index for an unspecified reason.

Submitted URL not selected as canonical: The URL is one of a set of duplicate URLs without an explicitly marked canonical page. You explicitly asked this URL to be indexed, but because it is a duplicate, and Google thinks that another URL is a better candidate for canonical, Google did not index this URL. Instead, we indexed the canonical that we selected. The difference between this status and “Google chose different canonical than user” is that, in this case, you explicitly requested indexing.

Jan-22 2017: Search Console performed an infrastructure update that may cause a change in your data.

Google Webmaster Tools performed an update on January 22nd which may cause anomalies in search data.

For my web properties, this resulted in a 50% drop in reported “Structured Data” elements in webmaster tools.

According to Google’s Data Anomalies reporting page, an event for January 22nd has not yet been reported.

 

Google Custom Crawl Rate

I’d like to share my experience with Google’s crawl rate change feature under settings in Google Webmaster Tools.
It seems there is a consensus across the interwebs that this feature is only for slowing down google’s crawl rate on your servers. Let me show you my logs, which say quite differently.

Around May 14th, 2014, I released a new site of mine with several hundred thousand unique pages with decent content (sitemaps too).
Within 48 hours, Google was crawling all the URLs in my sitemap at a rate of about 1 per second.
On the 19th, unsatisfied with the crawl rate of just 1 per second and looking to improve it, I tweaked the settings in my GWT to “limit” the crawl to 3 requests per second. I received the following confirmation message:

We’ve received a request from a site owner to change the rate at which Googlebot crawls this site: http:// – .co/
New crawl rate: Custom rate
– 3.062 requests per second
– 0.327 seconds per request
Your request will take a day or two to come into effect.
This new crawl rate will stay in effect for 90 days.
We recommend that you set a custom crawl rate only if you’re experiencing traffic problems with your server. To use Google’s recommended crawl rate, follow these steps.
1. On the Dashboard, select the site you want.
2. Click Settings.
3. In the Crawl rate section, select “Let Google determine my crawl rate”.

On the evening of May 20th, Google bumped my crawl rate up to 3 requests per second.

Here is a snapshot of the logs over the past couple days to show the change. You’re welcome to draw conclusions yourself, and I’d be happy to hear of alternative reasons that google tripled my crawl rate.

web request log

This evening (May 20th), I made another change to increase the crawl rate yet again. We shall see if in 48 hours my crawl rate is bumped to 5 requests per second.

UPDATE Evening of May 21st: Just about 24 hours later, Google has once again bumped their crawl rate up to about 5 requests per second. I’m convinced that GWT’s crawl rate can be used to increase the crawl rate on your site. If you have content that google is interested in AND your server can handle the load, max out your crawl settings!

UPDATE 2: My experience with a couple other domains shows that it may take more than 24 hours (36-48 in some cases)
Cheers,
Luke