Skip to main content

Google Spiders Ignore Meta Tags & Robots.txt During Crawl

There are two methods to prevent Googlebots from crawling or indexing your webpages.

You can either add a "disallow" entry in the robots.txt file of your website or simply add the following <META> tag inside webpages that you don't want search engine spiders to crawl or index.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOARCHIVE">

Sounds simple, but we recently came across atleast two different cases where Googlebots are ignoring the META tag or robots.txt instructions. Let's looks at them briefly here:

Case A - del.icio.us

Google has indexed (and cached) ~1.4 million pages from the del.icio.us website. Now pay close attention to META tag on each of the del.icio.us webpages. You'll see the following text inside the HTML code of del.icio.us webpages [example]
<meta name="robots" content="noarchive,nofollow,noindex"/>
The tag clearly means that search engines are neither supposed to cache del.icio.us pages nor index them. Google is probably ignoring the META tags here.

Case B: Google Finance

The robots.txt file residing on www.google.com has the following instruction:
User-agent: *
Disallow: /finance
In simple English, these instructions mean that Googlebot is not supposed to index or crawl any webpage that's residing under the google.com/finance path.

Its again very surprising to see that atleast 44K pages from www.google.com/finance have been indexed and cached on Google servers. These pages also appear in organic search results.

Related: Google Finance: Guess the Date Contest

Update: Jim Kloss shares a similar problem with Googlebot ignoring their robots.txt file though other searchbots do obey the request. "We tell googlebot not to load these URL constructs but it ignores robots.txt. Nor were we able to get it to play nice via the webmaster control panel provided by Google...Our written email requests [to Google] to look into the situation were met with autoresponders."

Popular posts from this blog

How to Download Contacts from Facebook To Outlook Address Book

Facebook users are not too pleased with the "walled garden" approach of Facebook. The reason is simple - while you can easily import your Outlook address book and GMail contacts into Facebook, the reverse path is closed. There's no "official" way to export your Facebook friends email addresses or contact phone numbers out as a CSV file so that you can sync the contacts data with Outlook, GMail or your BlackBerry. Some third-party Facebook hacks like "Facebook Sync" (for Mac) and "Facebook Downloader" (for Windows) did allow you to download your Facebook friends' names, emails, mobile phone number and profile photo to the desktop but they were quickly removed for violation of Facebook Terms of Use. How to Download Contacts from Facebook There are still some options to take Friends data outside the walls of Facebook wall. Facebook offers the Takeout option allowing you to download all Facebook data locally to the disk (include...

Firefox Keyboard Shortcuts for Power Users

All features in Mozilla Firefox browser are accessible through the use of the keyboard. You can use shortcut keys to view and save Web pages, search the web, open new webpages, work with bookmarks, or find text on the current webpage. Some of the most common keyboard shortcuts in Mozilla Firefox are Ctrl+N (to open a new Firefox window), Ctrl+T (to open a new tab), Ctrl+F4 (to close the current tab) and Ctrl+S to save the current webpage. Mozilla Firefox supports many more powerful keyboard shortcuts. For instance, by pressing a simple key combination, you can manually delete autocomplete entries from the Firefox location bar or Web forms. I am sharing a list of my favorite Mozilla Firefox Keyboard Shortcuts that make web browsing with Firefox even more fun . And you also save your precious time as navigating through several layers of Firefox Toolbar menus is no longer necessary. 1. Web Search Ctrl+K moves the cursor to the Search Bar. You can then type in the terms you wish to fin...

Digital Inspiration

Digital Inspiration is a popular tech blog by  Amit Agarwal . Our popular Google Scripts include  Gmail Mail Merge  (send personalized emails with Gmail ),  Document Studio (generate PDFs from Google Forms ) and   File Upload Forms ( receive files  in Google Drive). Also see  Reverse Image Mobile Search , Online Speech Recognition and Website Screenshots , the most useful websites on the Internet.