What is robots.txt?
Robots.txt is a text file that webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website.
Why is it important?
robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents.
The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.
How does it work?
To crawl websites, search engines like Google, follow links to get from one site to another, crawling across many links and websites. This crawling behavior is called as "Spidering".
After arriving at a website, the search crawler will look for a robots.txt file. If it finds one, the crawler will read that file first before continuing through the page.
Because the robots.txt file contains information about "how" the search engine should crawl, the information found there will instruct further crawler action on this particular site. If the robots.txt file does not contain any directives that disallow a user-agent’s activity (or if the site doesn’t have a robots.txt file), it will proceed to crawl the entire website.
Robots.txt is has two basic parts which are User-agent and directives.
User-agent is the specific web crawler to which you’re giving crawl instructions (usually a search engine). A list of most user agents can be found here.
Below is a basic robots.txt that will disallow Googlebot and Google Web Crawlers to stay away from the entire server.
To give instructions to multiple robots, create a set of user-agent and disallow Directives for each one like the one below. This would disallow Google and Bing's user agent to stay away from the whole site.
Disallow is the command used to tell a user-agent not to crawl particular URL. Only one "Disallow:" line is allowed for each URL.
Allow (Only applicable for Googlebot) is the command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
Do I have a robots.txt file?
You can type in your root domain, then add /robots.txt to the end of the URL. For instance, Serpbook's robots file is located at serpbook.com/robots.txt
If no .txt page appears, you do not currently have a (live) robots.txt page.
How can I create a robots.txt file?
We hope you find this helpful! If you need any further assistance, please contact us and we'll be more than glad to help you out.
Or Visit our Help Center: