Things You Should Know beforeBefore you decide to install Robots.txt file, you should know that its settings cover 1 domain at a time. In case you have any sub-domains (i.e. shop.example.com), a separate robots.txt is required for them. When you run multiple online stores, it makes sense to involve separate files for each of them. On the whole, the process of implementing Robots.txt function is very simple: it’s nothing but a text file, so anyone can create it quickly with the help of preferred text editors. You can choose between DreamWeaver, Notepad, vim, and other code editors. A range of different robots exists. For instance, Googlebot and Bingbot can be used as crawlers. What really matters is that once you’ve launched Robots.txt file, it is supposed to reside at the root: if your store domain is, for example, www.e-store.com, you should insert robots.txt under the domain root where app directory is also present. It will be accessed as www.e-store.com/robots.txt then. Saving this file under any directory or subdirectory is useless. Two more necessary considerations when using robots.txt for Magento site are:
- This file is publicly available, so anyone can see the unwanted selections of your server
- The file may be ignored by the robots, especially malware that are able to scan the web for security vulnerabilities
Installation Process and TipsThere are several ways to install Magento Robots.txt. First, let us talk about how to do it manually. Since 2010, this file is available on the web. It is enough to copy the content we provide below to paste it in a newly created field named Robots.txt. Sitemap.xml location has to be changed before uploading the file into the site’s root (even if the Magento installation is in the subdirectory). This version of robots.txt is offered by byte.nl as an optimal one.
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these “robots” where not to go on your site,
# you save bandwidth and server resources.
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
# For more informationsk abocut the robots.txt standard, see:
# For syntax checking, see:
# Prevent blocking URL parameters with robots.txt
# Use Google Webmaster Tools > Crawl > Url parameters instead
# Website Sitemap
# Crawlers Setup
# Paths (clean URLs)
Disallow: /get.php # Magento 1.5+
# Do not index the general technical directories and files on a server
Disallow: /cgi-bin /
# Paths (no clean URLs)
Another way to install Robots.txt for Magento, is to follow these simple guide lines:
- Download the robots.text file first (there are a lot of sources available).
- Whenever your Magento is installed within a subdirectory, you will have to modify the robots.txt file correspondingly. It means, for instance, changing ‘Disallow: /404/’ to ‘Disallow: /your-sub-directory/404/’ and ‘Disallow: /app/’ to ‘Disallow: /your-sub-directory/app/’.
- Check if the domain you use has a sitemap.xml and add URL to your sitemap.xml afterwards.
- It’s time to upload the robots.txt file to your root folder. Just place the file within ‘httpdocs/’ directory. It can be done in two ways: by logging in your Control Panel with your credentials and via FTP client of your preference.
Useful tools to check your Robot.txtTo make sure that you set up your Robots.txt right try to use one of the tools listed below. They’ll help you to analyze the code and fix mistakes if there’s any. The easiest and the most reliable way to do so is to use Google Webmaster Tools. It allows you to check your robots.txt for free right from your admin panel. To do so you should:
- Go to Google Webmasters
- Click on «Search Console»
- Type your website’s name
- Click on «Crawl» in the Dashboard panel
- Choose «Robots.txt Tester» in the drop-down menu
- Type «Robots.txt» in the line after the slash
- Hit the «Test» button
For Magento BackendThis one involves applying an extension for robots.txt file. Instead of doing the whole job in hand, you can download special tools to generate robots.txt for Magento. Via the settings you can alter some main options. The good thing is that you can add own rules in addition to standard settings.
Reindexing robots.txtSearch engines often read changed Magento robots.txt for too long. Such tools as GWT can point at the time when your site was last indexed. If you want Google or other search engines to get the up-dated version sooner than in 24 hours or a hundred of visits, you can use Header Cache-Control in your .htaccess file. Apply this statement to your .htaccess file:
Header set Cache-Control "max-age=60, public, must-revalidate"