配置 Amazon Kendra Web Crawler 如何访问您的网站阻止 Amazon Kendra Web Crawler 抓取您的网站

为 Amazon Kendra Web 爬网程序配置 `robots.txt` 文件

Amazon Kendra 是一种智能搜索服务， AWS 客户使用它来索引和搜索自己选择的文档。为了索引网络上的文档，客户可以使用 Amazon Kendra Web Crawler，指明应为哪些 URL 编制索引以及其他操作参数。 Amazon Kendra 在为任何特定网站编制索引之前，客户必须获得授权。

Amazon Kendra Web Crawler 尊重标准 robots.txt 指令，例如Allow和。Disallow您可以修改网站robots.txt文件以控制 Amazon Kendra Web Crawler 如何抓取您的网站。

配置 Amazon Kendra Web Crawler 如何访问您的网站

您可以使用Allow和指Disallow令控制 Amazon Kendra Web Crawler 如何为您的网站编制索引。您还可以控制为哪些网页编制索引，以及不爬取哪些网页。

要允许 Amazon Kendra Web Crawler 抓取除不允许的网页之外的所有网页，请使用以下指令：


User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Disallow: /credential-pages/ # disallow access to specific pages

要允许 Amazon Kendra Web Crawler 仅抓取特定的网页，请使用以下指令：


User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Allow: /pages/ # allow access to specific pages

要允许 Amazon Kendra Web Crawler 抓取所有网站内容并禁止任何其他机器人抓取，请使用以下指令：


User-agent: amazon-kendra # Amazon Kendra Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages

阻止 Amazon Kendra Web Crawler 抓取您的网站

您可以使用该Disallow指令阻止 Amazon Kendra Web Crawler 将您的网站编入索引。您还可以控制爬取哪些网页以及不爬取哪些网页。

要阻止 Amazon Kendra Web Crawler 抓取网站，请使用以下指令：


User-agent: amazon-kendra # Amazon Kendra Web Crawler
Disallow: / # disallow access to any pages

如果您对 Amazon Kendra Web Crawler 有任何疑问或疑虑，可以联系AWS 支持团队。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

Amazon Kendra 网络爬虫连接器 v2.0

Box

为 Amazon Kendra Web 爬网程序配置 robots.txt 文件

配置 Amazon Kendra Web Crawler 如何访问您的网站

阻止 Amazon Kendra Web Crawler 抓取您的网站

为 Amazon Kendra Web 爬网程序配置 `robots.txt` 文件