posted time Created time: 2015-05-10 posted time Last updated time:

Google's robot types and how they crawl the website

There are some types of crawlers of Google. They has each role and timing to come. By watching crawler's log, the action and role can be guessed.

Watching crawler's action

By using log analyser for Google crawler, we can check the action of crawlers. By checking them, we can check the detail status of our website.

Google crawler's types

There are following crawler types. We can detect the types by the user agent.

  • Default Crawler
  • Image Crawler
  • Smartphone Crawler
  • Mobile Crawler
  • Feed fetcher

The Webmaster Tools shows crawling times on Crawl Stats page. Following picture is this website's statistic data.

Crawl Stats

It is summation of all types of crawlers. After started this website in Mar 7th, 2015, The Image crawler comes a lot of times. The first big access number in the data is not of crawling pages.

You can check it by using crawler log analyser bundled with Content Management System.

Crawler log analyser

Default Crawler

The default crawler is standard crawler of the googlebot. The user agent is following string.

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

This crawler crawls all pages, which are both one for desktop pc and smartphone, mobile.

This crawler collect content information on the website, and the information will indexed and analysed later.

Image crawler

The Image crawler is for image data which is in the page contents. After the web page is crawled by default crawlers, Image crawlers comes next to them.

This crawler uses "If-Modified-Since" Http header if it is necessary.

The user agent of this type of crawlers is following string.

"Googlebot-Image/1.0"

Smartphone Crawler

The smartphone crawler is to check if the page is mobile friendly or not. This crawler seems to exists only for that.

This crawler comes after the default crawlers and image crawlers.

The user agent of this type of crawlers is following string.

"Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

This crawler's user agent is iPnone based string.

If your website is not responsive web design base, you can redirect to the mobile page when it come.

Mobile Crawler

In addition to the Smartphone crawler, google has more crawler for other mobile phones. Their user agents are following string.

"DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

"SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

When these comes, you can redirect to the mobile page, if it exists.

This crawler do not comes so frequently. It seems to come to important pages only.

Feed fetcher

When you post url to the Google Plus, and pubsubhubbub request is sent from web server, this crawler comes. The user agent is below.

"FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)"

The rss and sitemap xml files are generally crawled by default crawler.

How to check the health of website

By checking crawler log, you can check the health of website. And it get accustomed, you'll be able to estimate next changes in the Web Master Tool's data.

Most important crawler is default crawler

The most important crawler is the default crawler. The crawled data by this type is analysed.

If your contents are poor, or the internal link balance is wrong, the rate of mobile crawlers becomes too large. The frequency of crawl of each  types are below, if the website is healthy.

  1. Default crawler
  2. Image Crawler
  3. Smartphone Crawler
  4. Mobile Crawler

If the number of image is a lot, sometimes Image crawler comes more frequently than the default crawlers.

Check last crawled time

Checking last crawled time is useful. By using summary of the sitemap pages, you can check it.

Last crawled times

By sorting records by Last crawled time, the page's are sorted, and you can check the pages which are not crawled recently.

If most of the pages are crawled within 2 weeks, it has no problem. But there are pages which is not crawled long times, and it has important contents, you have to change link structure around the pages.

Smartphone and Mobile crawlers remember mobile page

If you use mobile page, you can check the appreciation of pages by smartphone crawler.

At the first crawl, it access the page and the server detect mobile's user agent, and redirect to the mobile page. Then it crawls mobile page.

Usually, the crawler access the mobile page in 2 steps. But the google appreciate the page, it remembers mobile page, and crawl it directly.

Check Googlebot-Image's status code of Http

If the crawler remember what it downloaded, it uses "If-Modified-Since" Http header. And the server returns 304 code.

That means the google remember the contents in the index, and the website's content is enough important to store. The image data seem to be managed with html text contents.

Returns 304 status for If-modified

But according to my experience, the crawler does not use "If-Modified-Since" header when they re-index the pages in order to upgrade index.

Therefore, I guess that the return code of the image crawlers is 304, means the page is deeply analysed.


Go to Top