Meta Robots

What is it and what is it for
Meta robots directives or “meta tags” are pieces of HTML code placed in the header of a page that provide search engines with precise instructions on how to crawl and index the content of that URL.
In other words, they tell Google and other search engines whether or not to index the content of the page in question and whether or not to crawl the links on that page.

Meta Tag Example

Below we show an example of a meta tag:
The most common meta robots that we are going to find, and the most useful ones for SEO, are:

1. meta name=robots” content=”index,follow”

This meta tag is suggesting to Googlebot and other crawlers to index the content to include it in their listings and to follow all the links they find in this URL.
By default, when there are no meta robots indicated in the URL, this is the instruction that all bots follow when crawling any URL. Therefore it is not necessary to include it to make it easier for your content to be indexed.

2. meta name=”robots” content=”noindex,follow”

In this case, what we indicate to the search engine is that it must not index the URL in question, and thus will not be presented in the search engine results, but it does follow the links it finds when crawling the page.
We usually use it when a page has little value for the user but contains links that we want the search engine to follow so it can reach other sections of our website.

3. meta name=”robots” content=”noindex,nofollow”

The “noindex, nofollow” meta tag suggests to search engines that they should not index the page and that they should not follow the links found on the page.
When to use this directive?
Generally when the content of the page does not provide any value and when we do not want to follow the links found in the content.

4. meta name=”robots” content=”index,nofollow”

Here, we are suggesting to the search engines to include the content in their database but to not follow the links they find in the content of the URL in question.

Why they are important

These types of directives, together with the robots.txt file, the canonicals tags, and the X-robots-tag, allow you to control the indexability and crawlability of your website.

Hence their importance:
Since they allow to indicate to search engines on an individual level what to ignore and what not to ignore when including it in its index, as well as which contents are relevant or priority.

Thus, we can optimize the crawl budget and avoid duplicate content errors.

Below, we explain the parameters that search engine crawlers understand and follow when they are used in the robots’ meta-directives:

Indexing control parameters

Meta robots vs. Robots.txt

The main difference between them is that meta robots give indications about the indexing of pages, while robots.txt gives indications about crawling.
That is, robots.txt gives valid instructions for indexing and crawling entire sections of a domain such as categories, subfolders, and archives.
For example:
If we indicate at robots.txt level that a category should not be crawled, but the URLs belonging to that category have the index tag because we want to index them, Googlebot will not be able to access them because they are blocked and it is very difficult to index them.

SEO recommendations for Meta Robots application

All meta directives (robots or else) are discovered when a URL is crawled.
This means that if a robots.txt file does not allow URL crawling, any meta directives on a page (either in the HTML or in the HTTP header) will not be seen and will effectively be ignored.
In most cases, a meta robots tag with the parameters “noindex, follow” should be used as a way to restrict crawling or indexing rather than using robots.txt override.

Types of Meta Robots Directives

There are two types of meta robots:
  1. Meta robots tags: part of the HTML page located in the < head > section
  2. The X-robots tag: they are sent by the web server as HTTP headers
In both cases the same parameters and crawling or indexing instructions are applicable.
The “noindex” and “nofollow” parameters can be used with both meta robots and the x-robots tag.
What differentiates them is how these parameters are communicated to the crawlers.
Specifically, the x-robots tag allows the use of regular expressions, the execution of crawl directives in non-HTML files, and the application of parameters at the global level, not just at the URL level.

Remark: It is not necessary to use meta robots and the x-robots tag on the same page, as it is redundant.

In conclusion:

Meta tags or meta robots are a great tool to control the indexing and crawling of our website.
Using them in the right way together with the robots.txt file, canonicals, web architecture, etc, is essential to ensure the optimization of the crawl budget and facilitate the work of search engines as well as a correct transfer of authority through internal linking.
Remember that meta robots act at the URL level.

Important: We must remark that, as with robots.txt files, crawlers are not obliged to follow the instructions indicated by the metatags of your page. They act only as a suggestion.

Links and recommended readings:

Frequently Asked Questions

Depending on the case, these are the 3 different ways to block the Googlebot:

  1. Robots.txt: Use it if the crawling of your content is causing problems on the server or to block sections of the web that we do not want to publish in the Google index, such as the login credentials page to access the administration of your website. However, do not use robots.txt to either block private content (use server-side authentication instead) or manage canonicalization.
  2. Meta robots tag:: Use this if you need to control how an individual HTML page is displayed in search results.
  3. HTTP X-Robots-Tag header: Use if you need to control how content is displayed in search results only when you cannot use meta robots tags or want to control them through the server.

Meta robots tags should be used whenever you want to control indexing at the individual page level. In other words, to ensure that a URL is not indexed, always use the robots meta tag or the X-Robots-Tag HTTP header.