Configuring how Amazon Q Web Crawler accesses your website Stopping Amazon Q Web Crawler from crawling your website

Configuring a `robots.txt` file for Amazon Q Business Web Crawler

Amazon Q Business Web Crawler respects standard robots.txt directives like Allow and Disallow. You can modify the robot.txt file of your website to control how Amazon Q Web Crawler crawls your website.

Topics

Configuring how Amazon Q Web Crawler accesses your website
Stopping Amazon Q Web Crawler from crawling your website

Configuring how Amazon Q Web Crawler accesses your website

You can control how the Amazon Q Web Crawler indexes your website using Allow and Disallow directives. You can also control which web pages are indexed and which web pages are not crawled.

To allow Amazon Q Web Crawler to crawl all web pages except disallowed web pages, use the following directive:


User-agent: amazon-QBusiness    # Amazon Q Web Crawler
Disallow: /credential-pages/ # disallow access to specific pages

To allow Amazon Q Web Crawler to crawl only specific web pages, use the following directive:


User-agent: amazon-QBusiness   # Amazon Q Web Crawler
Allow: /pages/ # allow access to specific pages

To allow Amazon Q Web Crawler to crawl all website content and disallow crawling for any other robots, use the following directive:


User-agent: amazon-QBusiness # Amazon Q Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages

Stopping Amazon Q Web Crawler from crawling your website

You can stop Amazon Q Web Crawler from indexing your website using the Disallow directive. You can also control which web pages are crawled and which aren't.

To stop Amazon Q Web Crawler from crawling the website, use the following directive:


User-agent: amazon-QBusiness # Amazon Q Web Crawler
Disallow: / # disallow access to any pages

Amazon Q Web Crawler also supports the robots noindex and nofollow directives in meta tags in HTML pages. These directives stop the web crawler from indexing a web page and stops following any links on the web page. You put the meta tags in the section of the document to specify the rules of robots rules.

For example, the below web page includes the directives robots noindex and nofollow:



            <html>
            <head>
                <meta name="robots" content="noindex, nofollow"/>
                ...
            </head>
            <body>...</body>
            </html>

If you have any questions or concerns about Amazon Q Web Crawler, you can reach out to the AWS support team.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

IAM role

Amazon WorkDocs

Configuring a robots.txt file for Amazon Q Business Web Crawler

Topics

Configuring how Amazon Q Web Crawler accesses your website

Stopping Amazon Q Web Crawler from crawling your website

Configuring a `robots.txt` file for Amazon Q Business Web Crawler