Web Crawler connector overview

The following table gives an overview of the Amazon Q Business Web Crawler connector and its supported features.

Category	Feature	Support
Security	Authentication type	Basic NTLM/Kerberos Form SAML Note You don't need authentication to crawl public websites you have permission to crawl.
	Authentication credentials	Basic authentication Website username Website password NTLM/Kerberos authentication NTLM/Kerberos username NTLM/Kerberos password Form authentication Login page URL Website username Website password Username field Xpath Password field Xpath Password button Xpath (Optional) Username button Xpath SAML authentication Login page URL Website username Website password Username field Xpath Password field Xpath Password button Xpath (Optional) Username button Xpath
	Access Control List (ACL) crawling	No
	Identity crawling	No
Crawl features	Custom metadata	Yes
	Visual content processing	Yes. Amazon Q Business can extract and index content from images embedded in webpages and the following supported document types: PDF, PowerPoint, Microsoft Word (DOCX), Google Slides, Google Docs
	Entities	Yes. The following entities are supported: Web page Attachment See What is a document? for more details on what each connector crawls as a document.
	Field mappings	Yes. For more information, see Field mappings.
	Filters	Yes. The following filters are supported: Sync specific domains and subdomains Include files linked on web pages Regex patterns to crawl and index specific URLs Regex patterns to crawl and index specific files Include web pages by crawl depth Specify maximum file size and links per page for Amazon Q to crawl
	Sync mode	Supports full and new, modified, or deleted content sync
	File types	Supports all files supported by Amazon Q.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Amazon Q Web Crawler

Prerequisites

Web Crawler connector overview

Note