Cookies setting

Cookies help us enhance your experience on our site by storing information about your preferences and interactions. You can customize your cookie settings by choosing which cookies to allow. Please note that disabling certain cookies might impact the functionality and features of our services, such as personalized content and suggestions. Cookie Policy

Cookie Policy
Essential cookies

These cookies are strictly necessary for the site to work and may not be disabled.

Information
Always enabled
Advertising cookies

Advertising cookies deliver ads relevant to your interests, limit ad frequency, and measure ad effectiveness.

Information
Analytics cookies

Analytics cookies collect information and report website usage statistics without personally identifying individual visitors to Google.

Information
mageplaza.com

How to Configure Robots.txt in Magento 2

Vinh Jacker | 06-22-2016

How to Configure Robots.txt in Magento 2

As you know, configuring robot.txt is important to any website that is working on a site’s SEO. Particularly, when you configure the sitemap to allow search engines to index your store, it is necessary to give web crawlers the instructions in the robot.txt file to avoid indexing the disallowed sites. The robot.txt file, that resides in the root of your Magento installation, is directive that search engines such as Google, Yahoo, Bing can recognize and track easily. In this post, I will introduce the guides to configure the robot.txt file so that it works well with your site.

What is Robots.txt in Magento 2?

The robots.txt file instructs web crawlers to know where to index your website and where to skip. Defining this website robots - website crawlers relationship will help you optimize your website’s ranking. Sometimes you need it to identify and avoid indexing particular parts, which can be done by configuration. It is your decision to use the default settings or set custom instructions for each search engine.

Steps to Configure Magento 2 robots.txt file

  1. Log in to your Magento 2 Admin Panel.

  2. Click Content. In the Design section, select Configuration.

  3. Press edit to fix the Global Design Configuration.

Design configuration

  1. Open the Search Engine Robots section, and continue with the following:

Search engine robots

  • In Default Robots, select one of the following:

    • INDEX, FOLLOW: Instructs search engine crawlers to index the store and recheck for changes.

    • NOINDEX, FOLLOW: Prevents indexing but allows to recheck for updates.

    • INDEX, NOFOLLOW: Indexes the store once without rechecking changes.

    • NOINDEX, NOFOLLOW: Blocks indexing and avoids further checks.

  • In the Edit Custom instruction of robots.txt File field, enter custom instructions if needed.

  • In the Reset to Defaults field, click on Reset to Default button if you need to restore the default instructions.

  1. When complete, click Save Config to apply your changes.

Magento 2 Robots.txt Examples

You are also able to hide your pages from the website crawlers by setting custom instructions as follows:

  • Allows Full Access
User-agent:*
Disallow:
  • Disallows Access to All Folders
User-agent:*
Disallow: /

Magento 2 Default Robots.txt

Disallow: /lib/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Disallow: /sendfriend/
Disallow: /review/
Disallow: /*SID=
Disallow: /*?

# Disable checkout & customer account
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /customer/account/
Disallow: /customer/account/login/

# Disable Search pages
Disallow: /catalogsearch/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/

# Disable common folders
Disallow: /app/
Disallow: /bin/
Disallow: /dev/
Disallow: /lib/
Disallow: /phpserver/
Disallow: /pub/

# Disable Tag & Review (Avoid duplicate content)

Disallow: /tag/
Disallow: /review/

# Common files
Disallow: /composer.json
Disallow: /composer.lock
Disallow: /CONTRIBUTING.md
Disallow: /CONTRIBUTOR_LICENSE_AGREEMENT.html
Disallow: /COPYING.txt
Disallow: /Gruntfile.js
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /nginx.conf.sample
Disallow: /package.json
Disallow: /php.ini.sample
Disallow: /RELEASE_NOTES.txt

# Disable sorting (Avoid duplicate content)
Disallow: /*?*product_list_mode=
Disallow: /*?*product_list_order=
Disallow: /*?*product_list_limit=
Disallow: /*?*product_list_dir=

# Disable version control folders and others
Disallow: /*.git
Disallow: /*.CVS
Disallow: /*.Zip$
Disallow: /*.Svn$
Disallow: /*.Idea$
Disallow: /*.Sql$
Disallow: /*.Tgz$

More Robots.txt examples

Block Google bot from a folder

User-agent: Googlebot 
Disallow: /subfolder/

Block Google bot from a page

User-agent: Googlebot 
Disallow: /subfolder/page-url.html

Common Web crawlers (Bots) {#common-web-crawlers-(bots)}

Here are some common bots in the internet.

User-agent: Googlebot
User-agent: Googlebot-Image/1.0
User-agent: Googlebot-Video/1.0
User-agent: Bingbot
User-agent: Slurp		# Yahoo
User-agent: DuckDuckBot
User-agent: Baiduspider
User-agent: YandexBot
User-agent: facebot		# Facebook
User-agent: ia_archiver		# Alexa

How to add a sitemap to the robots.txt file in Magento 2?

Much like the robots.txt file, the Magento sitemap plays a crucial role in optimizing your website for search engines. It facilitates a more thorough analysis of your website links by search engines. As robots.txt provides instructions on what to analyze, it is advisable to include information about the sitemap in this file.

To integrate a sitemap into Magento’s robots.txt file, follow these steps:

Go to Store > Configuration > Catalog > XML Sitemap and locate the Search Engine Submission Settings section.

Activate the Submission to Robots.txt option.

Configuration

If you wish to incorporate a custom XML sitemap into robots.txt, proceed to Content >Design > Configuration > select a website > Search Engine Robots. Then, append a custom sitemap to the Edit custom instruction of the robot.txt File field.

Deploying Best Practices for Robots.txt

For optimal search engine performance, it is imperative to follow these best practices.

Disallow Irrelevant Pages

To enhance search engine optimization (SEO), it’s crucial to specify which pages or directories search engines should not index. In your robots.txt file, use the “Disallow” directive to prevent crawlers from accessing irrelevant content. For instance, if you have pages with duplicate or thin content, disallow them to avoid diluting your site’s overall quality.

Include a Sitemap

By adding your XML sitemap URL to the robots.txt file, you can find and index all relevant pages on your site. This ensures that search engines can efficiently navigate through your content, improving overall visibility.

Restrict Access to Sensitive Areas

Certain directories, such as admin panels or CMS directories, contain secret information. To safeguard these areas, use the “Disallow” directive in your robots.txt file. By doing so, you prevent search engines from inadvertently indexing sensitive content, maintaining security and privacy.

Optimize Crawl Budget

Crawling resources are finite, and search engine bots allocate a crawl budget to each site. To make the most of this budget, specify which areas should not be crawled frequently. For instance, review pages or product comparison pages may not require frequent indexing. Use the robots.txt file to limit excessive crawling of non-essential content, ensuring that important pages receive adequate attention.

Regularly Check for Errors

Even the best-configured robots.txt files can face issues. Regularly monitor your file for syntax errors or incorrect directives. Tools like Google Search Console can help identify any issues. By promptly solving errors, you maintain an effective robots.txt setup and enhance your site’s overall SEO performance.

FAQs for Robots.txt in Magento 2

1. What role does the Googlebot user agent play in configuring robots.txt within Magento 2?

The Googlebot user agent plays a crucial role in how your Magento website interacts with bots, particularly the Google bot.

By configuring your robots.txt file, you gain control over what content the bot is allowed or disallowed from indexing on your site. However, it’s essential to recognize that there are two distinct user agents involved:

  • Googlebot User Agent: Responsible for crawling and indexing your website.

  • Page User Agent: Represent the user agent or web crawler responsible for accessing and rendering your web pages. When configuring your robots.txt file, consider both the Google bot user agent and a page user agent to make sure that your website’s content is appropriately indexed and displayed.

2. What does “Access User Agent” mean in robots.txt?

An “Access User Agent” in robots.txt is related to a specific user agent or web crawler that you intentionally allow access to your website. Essentially, it’s a way to grant permission to certain bots while restricting others. By using directives, you can specify which user agents are permitted to access specific parts of your site. For example, if you want to allow Googlebot but disallow other crawlers, you can define rules accordingly.

Implementing the checkout pages disallow command in your robots.txt file is a recommended practice. Doing so helps prevent web crawlers from accessing sensitive user data during the checkout process. To achieve this, add the following line to your robots.txt file:

Disallow: /checkout/

4. How can I manage access for specific user agents in robots.txt?

To manage access for specific user agents in your robots.txt file, you need to use the User-agent directive. For instance, if you want to disallow the user agent called BadBot from crawling your site, run the lines below:

User-agent: BadBot Disallow: /

5. What’s the significance of allowing catalog search pages in robots.txt?

It would be best to allow search engines to index catalog search pages. Fortunately, Magento’s default settings typically permit this, eliminating the necessity for adding specific directives related to catalog search pages in the robots.txt file. Ensure that your robots.txt file does not contain any ‘disallow’ rules about catalog search URLs.

6. What is the “folders user agent” in Magento 2 robots.txt?

In Magento 2, the “folders user agent” directive pertains to a directive within the robots.txt file. This directive specifies which user agents (such as search engine bots) are allowed or disallowed from accessing specific folders or directories on your Magento website.

It’s a helpful way to control bot access and ensure that certain sensitive or internal areas remain off-limits to crawlers. By customizing this directive, you can fine-tune the behavior of web robots in relation to different parts of your site.

7. What are the default directives in Magento’s robots.txt file?

By default, Magento’s robots.txt file allows most search engine robots to access the whole website. However, it’s crucial to recognize that these default instructions serve as a starting point. As a website owner, you should review and customize these settings depended on your specific needs.

Customization allows you to optimize visibility for search engines while protecting sensitive or irrelevant content from being indexed. Remember that the robots.txt file is publicly accessible, so it’s not suitable for hiding confidential information. Regular review and adjustment are essential to adjust the instructions to fit your site’s requirements.

8. Does Robots.txt affect website performance in Magento 2?

While Robots.txt itself doesn’t directly impact website performance, improper configuration leading to excessive crawling restrictions can indirectly affect crawl budget and site performance.

9. What happens if I encounter errors in my Robots.txt file in Magento 2?

If you encounter errors in your Robots.txt file in Magento 2, such as syntax errors or incorrect directives, it may adversely affect search engine crawling and indexing of your website. It’s important to regularly monitor and validate your Robots.txt file using tools like Google’s Robots.txt Tester to identify and rectify any errors promptly.

10. Is it possible to set crawl delays for search engine bots using Robots.txt in Magento 2?

Yes, you can set crawl delays for search engine bots using the Robots.txt file in Magento 2 by use the “crawl-delay” directive followed by the number of seconds you wish to delay crawling. This allows you to control the rate at which search engine crawlers access your website, helping to manage server load and bandwidth usage.

The bottom line

Configuring Robots.txt is the first step to optimize your search engine rankings, as it enables the search engines to identify which pages to index or not. After that, you can take a look at this guide on how to configure Magento 2 sitemap. If you want a hassle-free solution that works right out of the box for your store with easy installation, check our SEO extension out. In case you need more help with this, contact us and we will handle the rest.

x
    Jacker

    With over a decade of experience crafting innovative tech solutions for ecommerce businesses built on Magento, Jacker is the mastermind behind our secure and well-functioned extensions. With his expertise in building user-friendly interfaces and robust back-end systems, Mageplaza was able to deliver exceptional Magento solutions and services for over 122K+ customers around the world.



    Related Post

    Website Support
    & Maintenance Services

    Make sure your store is not only in good shape but also thriving with a professional team yet at an affordable price.

    Get Started
    mageplaza services