Web Crawler connector overview - Amazon Q Business

Web Crawler connector overview

The following table gives an overview of the Amazon Q Business Web Crawler connector and its supported features.

Category Feature Support
Security Authentication type
  • Basic

  • NTLM/Kerberos

  • Form

  • SAML

Note

You don't need authentication to crawl public websites you have permission to crawl.

Authentication credentials

Basic authentication

  • Website username

  • Website password

NTLM/Kerberos authentication

  • NTLM/Kerberos username

  • NTLM/Kerberos password

Form authentication

  • Login page URL

  • Website username

  • Website password

  • Username field Xpath

  • Password field Xpath

  • Password button Xpath

  • (Optional) Username button Xpath

SAML authentication

  • Login page URL

  • Website username

  • Website password

  • Username field Xpath

  • Password field Xpath

  • Password button Xpath

  • (Optional) Username button Xpath

Access Control List (ACL) crawling No
Identity crawling No
Crawl features Custom metadata Yes
Entities Yes. The following entities are supported:
  • Web page

  • Attachment

Field mappings Yes. For more information, see Field mappings.
Filters Yes. The following filters are supported:
  • Filter comments in files

  • Sync specific domains and subdomains

  • Include files linked on web pages

  • Regex patterns to crawl and index specific URLs

  • Regex patterns to crawl and index specific files

  • Include web pages by crawl depth

  • Speficy maximum file size and links per page for Amazon Q to crawl

Sync mode Supports full and new, modified, or deleted content sync
File types Supports all files supported by Amazon Q.