How to create a perfect Robots.txt for Magento

robots txt

If you run a Magento-based Ecommerce, a so-called robot.txt file may be added onto your site by default to prevent search engines from indexing your site. It may occur at the development stage when you don’t even have real prices, products, or services. Your site won`t show up in top search engines results. Actually, this program can be easily set up and fixed through the Magento Admin Panel options.

Advantages of Using robots.txt in Magento

Another question is what for such robots are needed. Search engines tend to send tiny spiders to search your site to receive information in return. It is done to get your pages indexed in the search results. Well, Robots.txt is the best way to let search engines know where they are not encouraged to index. Except for the search engines, these robots can execute a specific function automatically for HTML and link validation. This file’s goal is to hide site’s Javascript, SID parameters, and prevent content duplication. It helps to improve your Magento SEO and reduce the amount of server resources. At last, these programs help with reducing the footprint other web-robots make on your bandwidth allocation through specifying a Crawl-Delay. So, there is enough reasons to involve Magento robots. But it is crucial to do it the right way.

Things You Should Know before

Before you decide to install Robots.txt file, you should know that its settings cover 1 domain at a time. In case you have any sub-domains (i.e. shop.example.com), a separate robots.txt is required for them. When you run multiple online stores, it makes sense to involve separate files for each of them. On the whole, the process of implementing Robots.txt function is very simple: it’s nothing but a text file, so anyone can create it quickly with the help of preferred text editors. You can choose between DreamWeaver, Notepad, vim, and other code editors. A range of different robots exists. For instance, Googlebot and Bingbot can be used as crawlers.

What really matters is that once you’ve launched Robots.txt file, it is supposed to reside at the root: if your store domain is, for example, www.e-store.com, you should insert robots.txt under the domain root where app directory is also present. It will be accessed as www.e-store.com/robots.txt then. Saving this file under any directory or subdirectory is useless.

Two more necessary considerations when using robots.txt for Magento site are:

  • This file is publicly available, so anyone can see the unwanted selections of your server
  • The file may be ignored by the robots, especially malware that are able to scan the web for security vulnerabilities

Installation Process and Tips

There are several ways to install Magento Robots.txt. First, let us talk about how to do it manually. Since 2010, this file is available on the web. It is enough to copy the content we provide below to paste it in a newly created field named Robots.txt. Sitemap.xml location has to be changed before uploading the file into the site’s root (even if the Magento installation is in the subdirectory).

This version of robots.txt is offered by byte.nl as an optimal one.

# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these “robots” where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more informationsk abocut the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
#
# Prevent blocking URL parameters with robots.txt
# Use Google Webmaster Tools > Crawl > Url parameters instead

# Website Sitemap
Sitemap: http://www.example.com/sitemap.xml

# Crawlers Setup
User-agent: *
Crawl-delay: 10

# Allowable Index
# Mind that Allow is not an official standard
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
# Allow: /catalogsearch/result/
Allow: /media/catalog/

# Directories
Disallow: /404/
Disallow: /app/

Disallow: /cgi-bin/
Disallow: /downloader/

Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
# Disallow: /media/
Disallow: /media/captcha/
# Disallow: /media/catalog/
#Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+

# Paths (no clean URLs)
#Disallow: /*.js$
#Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=
Disallow: /rss*
Disallow: /*PHPSESSID

Another way to install Robots.txt for Magento, is to follow these simple guide lines:

  1. Download the robots.text file first (there are a lot of sources available).
  2. Whenever your Magento is installed within a subdirectory, you will have to modify the robots.txt file correspondingly. It means, for instance, changing ‘Disallow: /404/’ to ‘Disallow: /your-sub-directory/404/’ and ‘Disallow: /app/’ to ‘Disallow: /your-sub-directory/app/’.
  3. Check if the domain you use has a sitemap.xml and add URL to your sitemap.xml afterwards.
  4. It’s time to upload the robots.txt file to your root folder. Just place the file within ‘httpdocs/’ directory. It can be done in two ways: by logging in your Control Panel with your credentials and via FTP client of your preference.

For Magento Backend

This one involves applying an extension for robots.txt file. Instead of doing the whole job in hand, you can download special tools to generate robots.txt for Magento. Via the settings you can alter some main options. The good thing is that you can add own rules in addition to standard settings.

Reindexing robots.txt

Search engines often read changed Magento robots.txt for too long. Such tools as GWT can point at the time when your site was last indexed. If you want Google or other search engines to get the up-dated version sooner than in 24 hours or a hundred of visits, you can use Header Cache-Control in your .htaccess file. Apply this statement to your .htaccess file:

<filesmatch ".(txt)$">
<IfModule mod_headers.c>
Header set Cache-Control "max-age=60, public, must-revalidate"
</IfModule>
</filesmatch>

On the whole, the majority of the Magento agencies have very similar approach when it comes to robots.txt. It is better to get an appropriate consultancy before copy/pasting any of the suggested codes to avoid harming your online Magento or Magento 2 store. You can always test your Robots.txt file with the help of Yandex Webmaster or Frobee.

If you have any questions about anything I’ve mentioned in this blog post or anything else related to Magento, please feel free to drop me a message via this form. I am on the team performing Magento SEO audits for large Magento projects.

Need help with robots.txt?

Enjoyed this post? Spread it to your friends!

Oleg Yemchuk

Oleg Yemchuk

 

Oleg Yemchuk is a SEO Manager at MavenEcommerce sharing office space with Magento business experts in NYC and software developers worldwide. Oleg is SEO expert by day, and geek by night. Favorite pastime: traveling in TARDIS.

Leave a Reply

Your email address will not be published. All fields are required.