Configuring a robots.txt
file for
Amazon Q Business Web Crawler
Amazon Q Business Web Crawler respects standard robots.txt directives like
Allow
and Disallow
. You can modify the
robot.txt
file of your website to control how Amazon Q Web Crawler
crawls your website.
Topics
Configuring how Amazon Q Web Crawler accesses your website
You can control how the Amazon Q Web Crawler indexes your website using
Allow
and Disallow
directives. You can also control
which web pages are indexed and which web pages are not crawled.
To allow Amazon Q Web Crawler to crawl all web pages except disallowed web pages, use the following directive:
User-agent: amazon-QBusiness # Amazon Q Web Crawler Disallow: /credential-pages/ # disallow access to specific pages
To allow Amazon Q Web Crawler to crawl only specific web pages, use the following directive:
User-agent: amazon-QBusiness # Amazon Q Web Crawler Allow: /pages/ # allow access to specific pages
To allow Amazon Q Web Crawler to crawl all website content and disallow crawling for any other robots, use the following directive:
User-agent: amazon-QBusiness # Amazon Q Web Crawler Allow: / # allow access to all pages User-agent: * # any (other) robot Disallow: / # disallow access to any pages
Stopping Amazon Q Web Crawler from crawling your website
You can stop Amazon Q Web Crawler from indexing your website using the
Disallow
directive. You can also control which web pages are
crawled and which aren't.
To stop Amazon Q Web Crawler from crawling the website, use the following directive:
User-agent: amazon-QBusiness # Amazon Q Web Crawler Disallow: / # disallow access to any pages
Amazon Q Web Crawler also supports the robots noindex
and
nofollow
directives in meta tags in HTML pages. These directives
stop the web crawler from indexing a web page and stops following any links on the
web page. You put the meta tags in the section of the document to specify the rules
of robots rules.
For example, the below web page includes the directives robots
noindex
and nofollow
:
<html> <head> <meta name="robots" content="noindex, nofollow"/> ... </head> <body>...</body> </html>
If you have any questions or concerns about Amazon Q Web Crawler, you can reach
out to the AWS support team