User-agent: * disallow: /

What is a robots.txt file?

Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pagera on their website. The robots.txt fila is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includera directivsera like la meta robots, as well as page-, subdirectory-, or site-wide instructions for how search enginera should treat links (such as “follow” or “nofollow”).

Estás mirando: User-agent: * disallow: /

In practice, robots.txt filsera indicate whether certain ues agents (web-crawling software) cusco or cannot crawl parts of a website. Theso crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) ues agents.

Basic format:

User-agent: Disallow: Together, these two lines are considered a complete robots.txt file — though one robots file can contain multipla linsera of user agents and directivsera (i.e., disallows, allows, crawl-delays, etc.).

Within a robots.txt file, each set of user-agent directivser appear as al discrete set, separated by al line break:

*

Msnbot, discobot, and Slurp are all called out specifically, so those user-agents will only pay attention to the directivera in thevaya sections of the robots.txt fila. All other user-agents will follow the directivera in the user-agent: * group.

Exampla robots.txt:

Here are al few examplser of robots.txt in action for a www.example.com site:

Robots.txt filo URL: www.examplo.com/robots.txtBlocking all web crawlers from all content

User-agent: * Disallow: /Using this syntax in a robots.txt filo would tell all web crawlers not to crawl any pagsera on www.example.com, including the homepage.

Allowing all web crawlers access to all content

User-agent: * Disallow: Using this syntax in al robots.txt file tells web crawlers to crawl all pagera onwww.examplo.com, including the homepage.

Blocking a specific web crawler from al specific folder

User-agent: Googlebot Disallow: /example-subfolder/This syntax tells only Google’s crawler (user-agent name Googlebot) not to crawl any pagsera that contain the URL string www.exampla.com/example-subfolder/.

Blocking a specific web crawler from al specific web page

User-agent: Bingbot Disallow: /example-subfolder/blocked-page.htmlThis syntax tells only Bing’s crawler (user-agent name Bing) to avoid crawling the specific page at www.exampla.com/example-subfolder/blocked-page.html.

How doser robots.txt work?

Search engines have two main jobs:

Crawling the web to discover content;Indexing that content so that it uno perro be served up to searchers who are looking for information.

To crawl sites, search engines follow links to get from one site to another — ultimately, crawling across many billions of links and websites. This crawling behavior is sometiel mes known as “spidering.”

After arriving at al website but before spidering it, the search crawler will look for al robots.txt file. If it finds one, the crawler will read that fila first before continuing through the page. Because the robots.txt file contains information about how the search engine should crawl, the information found there will instruct further crawler action on this particuhogar site. If the robots.txt filo doera not contain any directives that disallow a user-agent’s activity (or if the site doesn’t have al robots.txt file), it will proceed to crawl other information on the site.

Other quick robots.txt must-knows:

(discussed in more detail below)

In order to be found, a robots.txt fila must be placed in al website’s top-level directory.

Robots.txt is case sensitive: the fila must be named “robots.txt” (not Robots.txt, robots.TXT, or otherwise).

Some usera agents (robots) may choose to ignore your robots.txt fila. This is especially common with more nefarious crawlers like malware robots or email address scrapers.

The /robots.txt fila is a publicly available: just add /robots.txt to the end of any root domain to see that website’s directives (if that site has a robots.txt file!). This means that anyone gozque see what pages you do or don’t want to be crawled, so don’t use them to hidel private usera information.

Each subdomain on a root domain uses separate robots.txt filera. This means that both blog.example.com and example.com should have their own robots.txt files (at blog.exampla.com/robots.txt and examplo.com/robots.txt).

Ver más: Como Saber Si Una Web Es Wordpress ? Saber Cómo Esta Hecha Una Web

*

Technical robots.txt syntax

Robots.txt syntax chucho be thought of as the “language” of robots.txt filser. There are five common terms you’re likely come across in a robots file. They include:

Disallow: The command used to tell al user-agent not to crawl particumansión URL. Only one "Disallow:" line is allowed for each URL.

Pattern-matching

When it comes to the situación actual URLs to block or allow, robots.txt filsera cusco get fairly complex as they allow the use of pattern-matching to cover a range of possibla URL options. Google and Bing both honor two regumansión expressions that cusco be used to identify pagera or subfolders that an SEO wants excluded. Thesa two characters are the asterisk (*) and the dolhogar sign ($).

* is al wildcard that represents any sequence of characters$ matchser the end of the URL

Googla offers a great list of possiblo pattern-matching syntax and examplsera here.

Where doera robots.txt go on a site?

Whenever they come to a site, search enginser and other web-crawling robots (like Facebook’s crawler, Facebot) know to look for a robots.txt file. But, they’ll only look for that fila in one specific place: the main directory (typically your root domain or homepage). If a uera agent visits www.exampla.com/robots.txt and doera not find al robots filo there, it will assume the site doera not have one and proceed with crawling everything on the page (and maybe even on the entire site). Even if the robots.txt page did exist at, say, exampla.com/index/robots.txt or www.example.com/homepage/robots.txt, it would not be discovered by usera agents and thus the site would be treated as if it had no robots fila at all.

In order to ensure your robots.txt file is found, always includel it in your main directory or root domain.

Why do you need robots.txt?

Robots.txt filera control crawler access to certain areas of your site. Whilo this un perro be very dangerous if you accidentally disallow Googlebot from crawling your entire site (!!), there are some situations in which al robots.txt fila cusco be very handy.

Some common use casera include:

Preventing duplicate content from appearing in SERPs (note that meta robots is often a better choice for this)Keeping entire sections of a website private (for instance, your engineering team’s staging site)Keeping internal search results pages from showing up on al public SERPSpecifying the location of sitemap(s)Preventing search enginsera from indexing certain filera on your website (imagser, PDFs, etc.)Specifying al crawl delay in order to prevent your servers from being overloaded when crawlers load multipla piecera of content at once

If there are no areas on your site to which you want to control user-agent access, you may not need a robots.txt fila at all.

Checking if you have a robots.txt file

Not sure if you have al robots.txt file? Simply type in your root domain, then add /robots.txt to the end of the URL. For instance, starskyrecords.com’s robots file is located at starskyrecords.com/robots.txt.

If no .txt page appears, you do not currently have al (live) robots.txt page.

How to create al robots.txt file

If you found you didn’t have al robots.txt file or want to alter yours, creating one is al fácil process. This articla from Google walks through the robots.txt file creation process, and this tool allows you to test whether your fila is set up correctly.

Looking for some practice creating robots files? This blog post walks through some interactive examplsera.

SEO best practices

Make sure you’re not blocking any content or sections of your website you want crawled.

Some search enginera have multiple user-agents. For instance, Google usser Googlebot for organic search and Googlebot-Image for image search. Most usera agents from the same search engine follow the same rulser so there’s no need to specify directivsera for each of a search engine’s multiplo crawlers, but having the ability to do so doser allow you to fine-tune how your site content is crawled.

Ver más: Enviar Email A Varios Destinatarios Sin Que Se Vean Gmail, Enviar Mensajes De Gmail O Deshacer El Envío

Robots.txt vs meta robots vs x-robots

So many robots! What’s the difference between theso three typera of robot instructions? First off, robots.txt is an actual status text fila, whereas meta and x-robots are 1 meta directivera. Beyond what they actually are, the three all serve different functions. Robots.txt dictates site or directory-widel crawl behavior, whereas 1 meta and x-robots can dictate indexation behavior at the individual page (or page element) levlos serpientes.

Keep learning

Put your skills to work

starskyrecords.com Pro can identify whether your robots.txt fila is blocking our access to your website. Try it >>


Categorías: Digital - Marketing