Adding documents in languages other than English - Amazon Kendra

Adding documents in languages other than English

You can index documents in multiple languages. If you don't specify a language, Amazon Kendra indexes documents in English by default. You include the language code for a document in the document metadata as a field. See Field mappings and Custom attributes for more information on the _language_code field for a document.

You can specify the language code for all your documents in your data source when you call CreateDataSource. If a document doesn't have a language code specified in a metadata field, the document is indexed using the language code specified for all documents at the data source level. In the console, you can index documents in a supported language only at the data source level. Go to Data sources, then the Specify data source details page, and choose a language from the dropdown Language.

The following languages and their codes are supported (English or en is supported by default if you don't specify a language):

Language name Language code
Spanish es
French fr
German de
Portuguese pt
Japanese ja
Korean ko
Chinese zh
Italian it
Hindi hi
Arabic ar
Armenian hy
Basque eu
Bengali bn
Brazilian pt-BR
Bulgarian bg
Catalan ca
Czech cs
Danish da
Dutch nl
Finnish fi
Galician gl
Greek el
Hungarian hu
Indonesian id
Irish ga
Latvian lv
Lithuanian lt
Norwegian no
Persian fa
Romanian ro
Russian ru
Sorani ckb
Swedish sv
Turkish tr

Not all Amazon Kendra features are currently available for languages other than English. The following features aren't available for non-English indexes:

  • Semantic search of FAQs and extracted answers from documents. Keyword search is used for retrieving relevant FAQs and for document ranking.

  • Custom synonyms for domain-specific, business-specific, or specialized terms.

  • Query suggestions of popular queries relevant to a search.

  • Confidence scores of the search results.