Noindex is a value used in the robots meta tag of the HTML code of a URL to prevent the indexing of a page by search engines such as Google, Bing or Yahoo.
Google understands the noindex tag as a directive. Therefore, if it finds it, it will not show that page to users in its results pages.
The counterpart of noindex is “index”, which explicitly allows indexing although its use is not necessary since search engines interpret the absence of the tag as a green light to index the content.
The noindex tag allows you to decide whether a particular URL should be included in the search engine index or not.
Therefore, noindex is a great resource that allows us to control the indexing of each individual page with very little effort,
Noindex is a great resource that allows us to control the indexing of each individual page with very little effort.
For this very reason, this directive is one of the favorite optimization tools of all SEOs.
There are two ways to implement the noindex tag: via a meta tag in the page’s HTML or via an HTTP response header.
Both options have the same result, so choose the one that best suits your website and the type of content you have.
To prevent most search engines from indexing a page on your site, you can include the following meta tag in the <head> section of the page:
Here is an example of the noindex tag syntax:
In addition, we can also prevent a page from being indexed for a specific bot.
Here are several examples:
Instead of using a meta tag, you can also include an X-Robots-Tag header in your page’s HTTP response with the values noindex or none. This response header is useful for non-HTML resources such as PDFs, images, and videos.
Here is an example of what an HTTP response would look like with an X-Robots-Tag header
HTTP/1.1 200 OK Content-Type: text/html X-Robots-Tag: noindex Página no indexable Esta página no debería ser indexada por los motores de búsqueda.
The general recommendation when applying this directive is very simple:
This can be very subjective so here are some examples of content or pages that you should not index:
This will tell search engines not to index the page. You can also use the canonical tag to tell search engines which is the major version of a page with duplicate content.
Depending on the type of website or page you manage, you should apply one criterion or another, but always, to be sure, ask yourself if the page in question has value for the user.
It is very important to emphasize that the noindex tag of a page does not prevent search engine crawlers from fully crawling that URL.
It only prevents them from displaying it to users in their search results.
Therefore,
If we are looking to prevent a page from being crawled and indexed by a search engine, we must resort to the use of robots.txt.
Specifically, the “Disallow” directive.
In this way, we prevent the crawling of the page and its subsequent indexing (although this is not always achieved).
In any case, if you want to ensure that both directives are met, you can combine a disallow with a noindex in the robots.txt by adding both directives to the robots.txt file:
Disallow: /example-page-1/
Noindex: /example-page-1/
WARNING: Noindex (page) + Disallow: cannot be combined with noindex on the page, because the page is blocked and therefore search engines will not crawl it to know not to leave the page out of the index.
The “noindex” meta tag is a super useful resource to control duplicate content, similar or thin content. That is to say, all the content of little value for the user and that therefore can create positioning problems for us.
The correct use of this directive, together with other meta tags nofollow, follow and the robots.txt, is vital to optimize the indexing and crawlability of our website. Knowing how and when to use this noindex tag is essential to make the job of search engines easier.
Important: Google always adheres to a noindex directive, while the index tag is only considered a recommendation.
Links and recommended reading:
The meta robots noindex of a web page is one of the essential attributes to control its appearance in search engine results. If you want to learn how to use it on your website, avoid mistakes and make Google’s job easier, you need to fully master this concept.
A ‘noindex’ tag tells search engines not to include the page in search results. The most common method of not indexing a page is to add a tag in the header section of the HTML, or in the HTTP response headers. In order for search engines to see this information, the page must not already be disallowed in a robots.txt