Adding documents in languages other than English
You can index documents in multiple languages. If you don't specify a language,
Amazon Kendra indexes documents in English by default. You include the language
code for a document in the document metadata as a field. See Field mappings and Custom
attributes for more information on the _language_code
field for
a document.
You can specify the language code for all your documents in your data source when you call CreateDataSource. If a document doesn't have a language code specified in a metadata field, the document is indexed using the language code specified for all documents at the data source level. In the console, you can index documents in a supported language only at the data source level. Go to Data sources, then the Specify data source details page, and choose a language from the dropdown Language.
You can also search or query documents in a supported language. For more information, see Searching in languages.
The following languages and their codes are supported (English or en
is
supported by default if you don't specify a language). This table includes languages
that Amazon Kendra supports with full semantic search, as well as languages that
only support simple keyword matching. Languages that support full semantic search are
marked with an asterisk and are in bold text in the following table. English (default
language) is also supported with full semantic search.
Note
Advanced search queries that use search keywords like AND
and
OR
aren't supported for Japanese language.
Language name | Language code |
---|---|
Arabic | ar |
Armenian | hy |
Basque | eu |
Bengali | bn |
Bulgarian | bg |
Catalan | ca |
Chinese – simplified and traditional* | zh |
Czech | cs |
Danish | da |
Dutch | nl |
Finnish | fi |
French – includes French (Canada)* | fr |
Galician | gl |
German* | de |
Greek | el |
Hindi | hi |
Hungarian | hu |
Indonesian | id |
Irish | ga |
Italian | it |
Japanese* | ja |
Korean* | ko |
Latvian | lv |
Lithuanian | lt |
Norwegian | no |
Persian | fa |
Portuguese | pt |
Portuguese (Brazil)* | pt-BR |
Romanian | ro |
Russian | ru |
Sorani | ckb |
Spanish – includes Spanish (Mexico)* | es |
Swedish | sv |
Turkish | tr |
*Semantic search is supported for the language.
For languages that support semantic search, the following features are supported.
-
Document relevance beyond simple keyword matching.
-
FAQs beyond simple keyword matching.
-
Extracting answers from documents based on Amazon Kendra's reading comprehension.
-
Confidence buckets (very high, high, medium, and low) of the search results.
For languages that don't support semantic search, simple keyword matching is supported for document relevance and FAQs.
Synonyms (including custom synonyms), incremental learning and feedback, and query suggestions are only supported for English (default language).