This is a root file that can be created to disallow access to search engines for non-relevant pages that take up valuable crawl budget. Instead of search engines crawling non-essential pages, disallowing access enables SEs to focus on pages with important content. Often, robots.txt can be applied to a website’s Privacy Policy, Terms of Use, member login pages and non-essential dynamic pages associated with a store. One thing to note, however, is that items in this file (file names) can be seen by others who type www.domain.com/robots.txt into a browser. Thus, page names reflecting content a company wants to keep private, such as a new product launch, should not be in robots.txt.
· Should be a file in the root of the site.
· Items in Robots.txt are case sensitive.
· No index should be added to each page that should not be indexed.
· Private pages should simply employ no index with rel = no follow for links to those private pages.
The Robots Meta tag lets you instruct search engine spiders whether they should or shouldn’t index, or archive, the page and crawl the links found on it. Some search engine robots do not recognize the robots Meta tag.
The content of the robots Meta tag contains directives separated by a comma. The "index" command specifies whether a robot should index a page. The "follow" command specifies whether a robot should follow links on a page.
To add a robots Meta tag to a page, place the robots tag between the HEAD sections of a page.
Here are some sample code and what they do:
Instructs spiders to index the page and follow all links:
<META NAME="robots" CONTENT="index, follow">
Instructs spiders not to index the page, but follow all links:
<META NAME="robots" CONTENT="noindex,follow">
Instruct spiders to index the page, but not follow any links:
<META NAME="robots" CONTENT="index,nofollow">
Instruct spiders to neither index the page, nor follow any links:
<META NAME="robots" CONTENT="noindex,nofollow">
Instruct spiders not to archive (cache) a page:
<META NAME="robots" CONTENT="noarchive">
Instruct Google’s spider (Googlebot) not to archive a page:
<META NAME="googlebot" CONTENT="noarchive">