Meta Robots

What is it and what is it for

What Meta Robots are and what they are used for

Meta robots directives or “meta tags” are pieces of HTML code placed in the header of a page that provide search engines with precise instructions on how to crawl and index the content of that URL.

In other words, they tell Google and other search engines whether or not to index the content of the page in question and whether or not to crawl the links on that page.

Where and how is the Robots.txt meta tag implemented?

The meta robots tag must be placed in the header area of the web page in question and must include two attributes: name and content to work properly.

Attribute "name"

The name attribute will indicate the trackers to which it applies.

Attribute "content"

The content attribute will indicate how it should behave based on the parameters used.

Meta Tag Example

 				 					meta name="robots" content="<a href="https://www.isocialweb.agency/en/digital-marketing-terms/noindex/">noindex</a>" />  <meta name="googlebot" content="noindex" />  <meta name="googlebot-news" content="noindex" />  <meta name="slurp" content="noindex" />  <meta name="msnbot" content="noindex" />

Below we show an example of a meta tag:

Indexing control parameters

Noindex: It tells a search engine to not index a page.
Index: It suggests a search engine to index a page.
Follow: Even if the page is not indexed, we tell the bot to follow all the links it finds, transferring the link juice to the linked pages.
Nofollow: It tells the crawler to not follow any link from the page in question and to not pass any page authority through those links.
Noimageindex: It tells a crawler to not index any images on a page.
None: It is equivalent to using both noindex and nofollow tags.
Noarchive: It suggests to search engines to not display the cached version of the page in question.
Nocache: It is the same as noarchive, but only used by Internet Explorer and Firefox.
Nosnippet: It suggests to search engines to not show the rich snippets of that page in the search results.
Unavailable_after: Search engines should no longer index this page after a certain date.

Guidelines for different search engines

However, not all browsers support all directives:

Value	Google	Bing	Yandex
index	✔️	✔️	✔️
noindex	✔️	✔️	✔️
none	✔️	❌	✔️
noimageindex	✔️	❌	❌
follow	✔️	✔️	✔️
nofollow	✔️	✔️	✔️
noarchive/nocache	✔️	✔️	✔️
nosnippet	✔️	❌	❌
notranslate	✔️	❌	❌
unavailable_after	✔️	❌	❌

Example of use and interpretation of directives

The content attribute will indicate how it should behave based on the parameters used.

The most common parameters that we are going to find and useful for SEO are the index tag, noindex tag, follow tag and the no follow tag:

1. meta name=robots" content="index,follow"

This meta tag is suggesting to Googlebot and other crawlers to index the content to include it in their listings and to follow all the links they find in this URL.

By default, when there are no meta robots indicated in the URL, this is the instruction that all bots follow when crawling any URL. Therefore it is not necessary to include it to make it easier for your content to be indexed.

2. meta name="robots" content="noindex,follow"

In this case, what we indicate to the search engine is that it must not index the URL in question, and thus will not be presented in the search engine results, but it does follow the links it finds when crawling the page.

We usually use it when a page has little value for the user but contains links that we want the search engine to follow so it can reach other sections of our website.

3. meta name="robots" content="noindex,nofollow"

The “noindex, nofollow” meta tag suggests to search engines that they should not index the page and that they should not follow the links found on the page.

When to use this directive?

Generally when the content of the page does not provide any value and when we do not want to follow the links found in the content.

4. meta name="robots" content="index,nofollow"

Here, we are suggesting to the search engines to include the content in their database but to not follow the links they find in the content of the URL in question.

Types of Robots Tags

There are two types of meta robots:

Meta robots tags: part of the HTML page located in the < head > section
The X-robots tag: they are sent by the web server as HTTP headers

In both cases the same parameters and crawling or indexing instructions are applicable.

The “noindex” and “nofollow” parameters can be used with both meta robots and the x-robots tag.

What differentiates them is how these parameters are communicated to the crawlers.

Specifically, the x-robots tag allows the use of regular expressions, the execution of crawl directives in non-HTML files, and the application of parameters at the global level, not just at the URL level.

Remark: It is not necessary to use meta robots and the x-robots tag on the same page, as it is redundant.

Why they are important

These types of directives, together with the robots.txt file, the canonicals tags, and the X-robots-tag, allow you to control the indexability and crawlability of your website.

Hence their importance:

Since they allow to indicate to search engines on an individual level what to ignore and what not to ignore when including it in its index, as well as which contents are relevant or priority.

Thus, we can optimize the crawl budget and avoid duplicate content errors.

Below, we explain the parameters that search engine crawlers understand and follow when they are used in the robots’ meta-directives:

Differences between Meta robots vs. Robots.txt

The main difference between them is that meta robots give indications about the indexing of pages, while robots.txt gives indications about crawling.

That is, robots.txt gives valid instructions for indexing and crawling entire sections of a domain such as categories, subfolders, and archives.

For example:

If we indicate at robots.txt level that a category should not be crawled, but the URLs belonging to that category have the index tag because we want to index them, Googlebot will not be able to access them because they are blocked and it is very difficult to index them.

SEO recommendations for Meta Robots application

All meta directives (robots or else) are discovered when a URL is crawled.

This means that if a robots.txt file does not allow URL crawling, any meta directives on a page (either in the HTML or in the HTTP header) will not be seen and will effectively be ignored.

In most cases, a meta robots tag with the parameters “noindex, follow” should be used as a way to restrict crawling or indexing rather than using robots.txt override.

In conclusion:

Meta tags or meta robots are a great tool to control the indexing and crawling of our website.

Using them in the right way together with the robots.txt file, canonicals, web architecture, etc, is essential to ensure the optimization of the crawl budget and facilitate the work of search engines as well as a correct transfer of authority through internal linking.

Remember that meta robots act at the URL level.

Important: We must remark that, as with robots.txt files, crawlers are not obliged to follow the instructions indicated by the metatags of your page. They act only as a suggestion.

Frequently Asked Questions

What method should I use to block trackers?

Depending on the case, these are the 3 different ways to block the Googlebot:

Robots.txt: Use it if the crawling of your content is causing problems on the server or to block sections of the web that we do not want to publish in the Google index, such as the login credentials page to access the administration of your website. However, do not use robots.txt to either block private content (use server-side authentication instead) or manage canonicalization.
Meta robots tag:: Use this if you need to control how an individual HTML page is displayed in search results.
HTTP X-Robots-Tag header: Use if you need to control how content is displayed in search results only when you cannot use meta robots tags or want to control them through the server.

When to use Meta Robots instead of Robots.txt

Meta robots tags should be used whenever you want to control indexing at the individual page level. In other words, to ensure that a URL is not indexed, always use the robots meta tag or the X-Robots-Tag HTTP header.

Links and recommended readings:

[kkstarratings force]