User-agent: * Disallow: /wp-admin/ Disallow: /?s= Allow: /wp-admin/admin-ajax.ph Sitemap: https://www.isocialweb.agency/post-sitemap.xml Sitemap: https://www.isocialweb.agency/page-sitemap.xml
Whenever a bot or web crawler targets a site, be it Googlebot, Facebook web crawler (Facebot), or any other, they go directly to look for the robots.txt file.
And they are always going to look for it in the same place: the home directory.
That is:
www.ejemplo.com/robots.txt
If an agent or bot visits this address by default but does not find a robots file there, it will assume that the site does not have one and will proceed to crawl everything on the page.
Even if the robots.txt page existed, but in another location, no crawler would bother to look for it and therefore the site would be treated as if it had no robots file.
To ensure that the robots.txt file is found, always include it in your home directory or root domain.
To crawl sites, search engines follow links to get from one site to another and ultimately crawl billions of links and websites. This crawling behavior is sometimes referred to as “spidering”.
After arriving at a website, but before crawling it, the search crawler will look for a robots.txt file. If it finds one, the crawler will read that file first before it continues crawling through the page.
Since the robots.txt file contains information about how the search engine should crawl, the information found in it will instruct the crawler’s action on this particular site.
If the robots.txt file does not contain any directives that prevent user agent activity (or if the site does not have a robots.txt file), it will proceed to crawl other information on the site.
Robots.txt syntax can be considered as the “language” of robots.txt files.
There are five common terms that you are likely to find in a robots file.
They are as follows:
It indicates the name of the specific web crawler you are giving crawl instructions to. You can find a list of most user-agents here: Googlebot, Googlebot-Image, Bingbot, Slurp, Baiduspider, DuckDuckBot
The command is used to tell a user agent not to crawl a certain URL. Only one “Disallow:” line is allowed for each URL.
Only applicable to Googlebot. This command tells Googlebot that it can access a page or subfolder even if its parent page or subfolder is disabled.
The number of seconds a crawler should wait before loading and crawling the page content. Note that Googlebot does not recognize this command, but the crawl-delay can be configured in Google Search Console.
This is used to call the location of any XML sitemap associated with this URL. Note that this command is only supported by Google, Ask, Bing and Yahoo.
It’s easy to create a robots.txt file, you just need to know a few specific commands. You can create this file using your computer’s notepad or any other text editor you prefer.
It is also necessary to have access to the main folder of your domain, since it is there where you must save the file that you have created. To create a robots.txt file, you need to go to the root of your domain and save the file there.
The robots.txt file is part of the Robots Exclusion Protocol (REP), a set of web rules that govern how robots crawl the web, access and index content, and deliver that content to users.
The truth is that robots.txt files are an aid for search engines, and having this file always updated will help them to know better how to treat the different sections of your website.
This way we control the crawl budget.
Important: To ensure that your robots.txt file is found, always include it in your main directory or root domain. Also, keep in mind that this file is case-sensitive and can be ignored by malicious bots. So, never include instructions to block the crawling of private parts of your website. In these cases restrict access by using passwords or permissions from the server.
Robots.txt tester:
Links and recommended readings:
It is very easy. Just add to your root domain the slug /robots.txt at the end of the URL. For example, yourdomainname.com/robots.txt.
If no .txt page appears, you do not currently have a robots.txt page.
If you have discovered that you don’t have a robots.txt file, or you want to modify it, creating one is a simple process. This wiki article explains the process of creating a robots.txt file, and checking if your file is set up correctly.
The robots.txt is an actual text file, while meta robots and x-robots are meta directives. Beyond what they really are, the three fulfill different functions.