If you run a Magento-based Ecommerce, a so-called robot.txt file may be added onto your site by default to prevent search engines from indexing your site. It may occur at the development stage when you don’t even have real prices, products, or services. Your site won`t show up in top search engines results. Actually, this program can be easily set up and fixed through the Magento Admin Panel options.
Advantages of Using robots.txt in Magento
Things You Should Know before
Before you decide to install Robots.txt file, you should know that its settings cover 1 domain at a time. In case you have any sub-domains (i.e. shop.example.com), a separate robots.txt is required for them. When you run multiple online stores, it makes sense to involve separate files for each of them. On the whole, the process of implementing Robots.txt function is very simple: it’s nothing but a text file, so anyone can create it quickly with the help of preferred text editors. You can choose between DreamWeaver, Notepad, vim, and other code editors. A range of different robots exists. For instance, Googlebot and Bingbot can be used as crawlers.
What really matters is that once you’ve launched Robots.txt file, it is supposed to reside at the root: if your store domain is, for example, www.e-store.com, you should insert robots.txt under the domain root where app directory is also present. It will be accessed as www.e-store.com/robots.txt then. Saving this file under any directory or subdirectory is useless.
Two more necessary considerations when using robots.txt for Magento site are:
- This file is publicly available, so anyone can see the unwanted selections of your server
- The file may be ignored by the robots, especially malware that are able to scan the web for security vulnerabilities
Installation Process and Tips
There are several ways to install Magento Robots.txt. First, let us talk about how to do it manually. Since 2010, this file is available on the web. It is enough to copy the content we provide below to paste it in a newly created field named Robots.txt. Sitemap.xml location has to be changed before uploading the file into the site’s root (even if the Magento installation is in the subdirectory).
This version of robots.txt is offered by byte.nl as an optimal one.
# robots.txt # # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these “robots” where not to go on your site, # you save bandwidth and server resources. # # This file will be ignored unless it is at the root of your host: # Used: http://example.com/robots.txt # Ignored: http://example.com/site/robots.txt # # For more informationsk abocut the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html # # Prevent blocking URL parameters with robots.txt # Use Google Webmaster Tools > Crawl > Url parameters instead # Website Sitemap Sitemap: http://www.example.com/sitemap.xml # Crawlers Setup User-agent: * Crawl-delay: 10 # Allowable Index # Mind that Allow is not an official standard Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ # Allow: /catalogsearch/result/ Allow: /media/catalog/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ # Disallow: /media/ Disallow: /media/captcha/ # Disallow: /media/catalog/ #Disallow: /media/css/ #Disallow: /media/css_secure/ Disallow: /media/customer/ Disallow: /media/dhl/ Disallow: /media/downloadable/ Disallow: /media/import/ #Disallow: /media/js/ Disallow: /media/pdf/ Disallow: /media/sales/ Disallow: /media/tmp/ Disallow: /media/wysiwyg/ Disallow: /media/xmlconnect/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /scripts/ Disallow: /shell/ #Disallow: /skin/ Disallow: /stats/ Disallow: /var/ # Paths (clean URLs) Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalog/product/gallery/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ # Files Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt Disallow: /get.php # Magento 1.5+ # Paths (no clean URLs) #Disallow: /*.js$ #Disallow: /*.css$ Disallow: /*.php$ Disallow: /*?SID= Disallow: /rss* Disallow: /*PHPSESSID
Another way to install Robots.txt for Magento, is to follow these simple guide lines:
- Download the robots.text file first (there are a lot of sources available).
- Whenever your Magento is installed within a subdirectory, you will have to modify the robots.txt file correspondingly. It means, for instance, changing ‘Disallow: /404/’ to ‘Disallow: /your-sub-directory/404/’ and ‘Disallow: /app/’ to ‘Disallow: /your-sub-directory/app/’.
- Check if the domain you use has a sitemap.xml and add URL to your sitemap.xml afterwards.
- It’s time to upload the robots.txt file to your root folder. Just place the file within ‘httpdocs/’ directory. It can be done in two ways: by logging in your Control Panel with your credentials and via FTP client of your preference.
For Magento Backend
This one involves applying an extension for robots.txt file. Instead of doing the whole job in hand, you can download special tools to generate robots.txt for Magento. Via the settings you can alter some main options. The good thing is that you can add own rules in addition to standard settings.
Search engines often read changed Magento robots.txt for too long. Such tools as GWT can point at the time when your site was last indexed. If you want Google or other search engines to get the up-dated version sooner than in 24 hours or a hundred of visits, you can use Header Cache-Control in your .htaccess file. Apply this statement to your .htaccess file:
<filesmatch "\.(txt)$"> <IfModule mod_headers.c> Header set Cache-Control "max-age=60, public, must-revalidate" </IfModule> </filesmatch>
On the whole, the majority of the Magento agencies have very similar approach when it comes to robots.txt. It is better to get an appropriate consultancy before copy/pasting any of the suggested codes to avoid harming your online Magento or Magento 2 store. You can always test your Robots.txt file with the help of Yandex Webmaster or Frobee.
If you have any questions about anything I’ve mentioned in this blog post or anything else related to Magento, please feel free to drop me a message via this form. I am on the team performing Magento SEO audits for large Magento projects.