Supported document formats in Amazon Q Business - Amazon Q Business

Supported document formats in Amazon Q Business

You can add documents to an Amazon Q Business application in three ways:

When you add documents to an Amazon Q application (directly or through datasource connectors) using the console or the API, Amazon Q extracts document content and internally parses these to optimize chat responses. The maximum file size of a single document must be 50 MB or less. The maximum amount of text that can be extracted from a single document is 5 MB.

When you upload documents directly into chat using the Upload files and chat feature, the size of each file you upload must be 10 MB or less. The total parsed content for all files combined have to be under 30,000 tokens or 20,000 words. One word corresponds roughly to 1.5 tokens.

Additionally, if you’re uploading Comma Separated Values (CSV) or Microsoft Excel (XLS and XLSX) documents directly into chat, Amazon Q performs best for tables with approximately 4 columns and 10 rows. Files indexed by an Amazon Q data source connector or uploaded directly have no such restrictions.

Along with specific formats like PDF, Word, for example, each enterprise data source also has different entities that it considers documents. To learn about supported entity types for each data source, see What is a document?.

Supported document types

The following table shows the document formats that Amazon Q Business supports.

Document format How document is treated
Portable Document Format (PDF) Converted to HTML, then plain text is extracted. Scanned PDFs aren't supported as they are images.
HyperText Markup Language (HTML) HTML tags are filtered out to extract plain text. Content must be between the main HTML start and closing tags (<HTML>content</HTML>).
Extensible Markup Language (XML) XML tags are filtered out and plain text is extracted.
Extensible Stylesheet Language Transformations (XSLT) Tags are filtered out to extract plain text.
Markdown (MD) Content is extracted as plain text with Markdown syntax retained.
Comma Separated Values (CSV) Content is extracted as plain text from each cell, with a single file treated as a single document result. Amazon Q doesn't support analytics questions for CSVs; it supports only qualitative questions.
Microsoft Excel (XLS and XLSX) Content is extracted as plain text from each cell, with a single row treated as a single document result. Amazon Q doesn't support analytics questions for Excel files; it supports only qualitative questions.
JavaScript Object Notation (JSON) Content is extracted as plain text with JSON syntax retained.
Rich Text Format (RTF) RTF syntax is filtered out to extract plain text content.
Microsoft PowerPoint (PPT, PPTX) Only plain text content is extracted from PowerPoint slides for ingestion. Images and other content aren't extracted.
Microsoft Word (DOCX) Only plain text content is extracted from Word pages for ingestion. Images and other content aren't extracted.
Plain text (TXT) All text in the text document is extracted.

What is a document?

When you directly add files to Amazon Q Business using the Using direct document upload or the Upload files and chat feature, it considers each file you add a document. When you connect Amazon Q to a data source, what Amazon Q considers—and crawls—as a document varies by connector.

The following table outlines what each connector crawls as a document.

Data source connector Supports crawling Document definition
Adobe Experience Manager (Cloud and Server)
  • Assets

  • Pages

  • Each Asset is considered a single document.

  • Each Page is considered a single document.

Alfresco (Cloud and Server)
  • Files

  • Comments

  • Each File is considered a single document.

  • Each Comment is considered a single document.

Amazon FSx (Windows) Files

Each File is considered a single document.

Amazon S3 Objects

Each Object is considered a single document. Any object-name.metadata.json file and access control list (ACL) file is considered metadata for the object it is associated with and not treated as a separate document.

Amazon Q Web Crawler
  • Web pages

  • Attachments

  • Each Web page is considered a single document.

  • Each Attachment is considered a single document.

Amazon WorkDocs
  • Files

  • Comments

  • Each File is considered a single document.

  • Each Comment is considered a single document.

Box
  • Files

  • Tasks

  • Comments

  • Weblinks

  • Each File is considered a single document.

  • Each Task is considered a single document.

  • Each Comment is considered a single document.

  • Each Weblink is considered a single document.

Confluence (Cloud and Server)
  • Spaces

  • Pages

  • Blogs

  • Comments

  • Attachments

  • Each Space is considered a single document.

  • Each Page is considered a single document.

  • Each Blog is considered a single document.

  • Each Comment is considered a single document.

  • Each Attachment is considered a single document.

Database data sources
  • Aurora (MySQL)

  • Aurora (PostgreSQL

  • Amazon RDS (Microsoft SQL Server)

  • Amazon RDS (MySQL)

  • Amazon RDS (Oracle)

  • Amazon RDS (PostgreSQL)

  • IBM DB2

  • PostgreSQL

  • Microsoft SQL Server

  • MySQL

  • Oracle Database

  • Table data in a single database

  • View data in a single database

Each row in a table and view is considered a single document.

Dropbox
  • Files

  • Papers

  • Paper templates

  • Shortcuts

  • Each File is considered a single document.

  • Each Paper is considered a single document.

  • Each Paper template is considered a single document.

  • Each Shortcut is considered a single document.

Drupal
  • Articles

  • Basic pages

  • Basic blocks

  • Custom content

  • Custom blocks

  • Comments on articles, basic pages, basic blocks, custom content, and custom blocks

  • Attachments in articles, basic pages, basic blocks, custom content, and custom blocks

  • Each Article is considered a single document.

  • Each Basic page is considered a single document.

  • Each Basic block is considered a single document.

  • Each Custom content is considered a single document.

  • Each Custom block is considered a single document.

  • Each Comment on an article, a basic page, a basic block, any custom content, and a custom block is considered a document.

  • Each Attachment in an article, a basic page, a basic block, any custom content, and a custom block is considered a document.

GitHub (Cloud and Server)
  • Respositories

  • Repository commits

  • Issues

  • Issue attachments

  • Issue comments

  • Pull request documents

  • Pull request comments

  • Pull request attachments

  • Each Repository is considered a single document.

  • Each Repository commit is considered a single document.

  • Each Issue is considered a single document.

  • Each Issue attachment is considered a single document.

  • Each Issue comment is considered a single document.

  • Each Pull request is considered a single document.

  • Each Pull request comment is considered a single document.

  • Each Pull request attachment is considered a single document.

Gmail
  • Emails

  • Email attachments

  • Each Email is considered a single document.

  • Each Email attachment is considered a single document.

Google Drive
  • Files

  • Comments

  • Each File is considered a single document.

  • Each Comment is considered a single document.

Jira
  • Projects

  • Issues

  • Comments

  • Attachments

  • Worklog

  • Each Project is considered a single document.

  • Each Issue is considered a single document.

  • Each Comment is considered a single document.

  • Each Attachment is considered a single document.

  • Each Worklog is considered a single document.

Microsoft Exchange
  • Emails

  • Attachments

  • Calendar

  • Contacts

  • Notes

  • OneNotes

  • Each Email is considered a single document.

  • Each Attachment is considered a single document.

  • Each Calendar is considered a single document.

  • Each Contact is considered a single document.

  • Each Note is considered a single document.

  • Each page in OneNotes is considered a single document.

Microsoft OneDrive
  • Files

  • OneNotes

  • Each File is considered a single document.

  • Each page in OneNotes is considered a single document.

Microsoft SharePoint (Online and Server)
  • Events

  • Pages

  • Files

  • Links

  • File attachments

  • Comments

  • OneNotes

  • Each Event is considered a single document.

  • Each Page is considered a single document.

  • Each File is considered a single document.

  • Each Link is considered a single document.

  • Each File attachment is considered a single document.

  • Each Comment is considered a single document.

  • Each page in OneNotes is considered a single document.

Microsoft Teams
  • Chat messages

  • Chat attachments

  • Channel posts

  • Channel wikis

  • Channel attachments

  • Meeting chats

  • Meeting files

  • Meeting notes

  • Calendar meetings

  • OneNotes

  • Each Chat message is considered a single document.

  • Each Chat attachment is considered a single document.

  • Each Channel post is considered a single document.

  • Each Channel wiki is considered a single document.

  • Each Channel attachment is considered a single document.

  • Each Metting chat is considered a single document.

  • Each Meeting file is considered a single document.

  • Each Meeting note is considered a single document.

  • Each Calendar meeting is considered a single document.

  • Each page in OneNotes is considered a single document.

Microsoft Yammer
  • Communities

  • Attachments

  • Messages

  • Users

  • Each Community is considered a single document.

  • Each Attachment is considered a single document.

  • Each Message and community post is considered a single document.

  • Each User is considered a single document.

Quip
  • Files

  • Messages

  • Threads

  • Each File is considered a single document.

  • Each Comment is considered a single document.

  • Each file and message posted in a Thread is considered a single document.

Salesforce
  • Accounts

  • Contacts

  • Campaigns

  • Contracts

  • Cases

  • Partners

  • Opportunities

  • Groups

  • Leads

  • Users

  • Tasks

  • Ideas

  • Profiles

  • Solutions

  • Chatters

  • Documents

  • Custom entities

  • Knowledge articles

  • Each Account is considered a single document.

  • Each Contact is considered a single document.

  • Each Campaign is considered a single document.

  • Each Contract is considered a single document.

  • Each Case is considered a single document.

  • Each Partner is considered a single document.

  • Each Opportunity is considered a single document.

  • Each Group is considered a single document.

  • Each Lead is considered a single document.

  • Each User is considered a single document.

  • Each Task is considered a single document.

  • Each Idea is considered a single document.

  • Each Profile is considered a single document.

  • Each Solution is considered a single document.

  • Each Chatter is considered a single document.

  • Each Document (file) is considered a single document.

  • Each Custom entity (record) is considered a single document.

  • Each Knowledge article is considered a single document.

ServiceNow
  • Incidents

  • Knowledge articles

  • Service catalog

  • Attachments

  • Each Incident is considered a single document.

  • Each Knowledge article is considered a single document.

  • Each Service catalog is considered a single document.

  • Each Attachment is considered a single document.

Slack
  • Messages

  • Message attachments

  • Channel posts

  • Each Message is considered a single document.

  • Each Message attachment is considered a single document.

  • Each Channel post is considered a single document.

Zendesk
  • Tickets

  • Ticket comments

  • Ticket comment attachments

  • Articles

  • Article attachments

  • Article comments

  • Community topics

  • Community posts

  • Community post comments

  • Each Ticket is considered a single document.

  • Each Ticket comment is considered a single document.

  • Each Ticket comment attachment is considered a single document.

  • Each Article is considered a single document.

  • Each Article attachment is considered a single document.

  • Each Article comment is considered a single document.

  • Each Community topic is considered a single document.

  • Each Community post is considered a single document.

  • Each Community post comment is considered a single document.