DB cluster parameters for collation and locale settings Deterministic and nondeterministic collations in Babelfish Collations supported at database level in Babelfish Server and object Collations Default collation behavior

Understanding Collations in Babelfish for Aurora PostgreSQL

When you create an Aurora PostgreSQL DB cluster with Babelfish, you choose a collation for your data. A collation specifies the sort order and bit patterns that produce the text or characters in a given written human language. A collation includes rules comparing data for a given set of bit patterns. Collation is related to localization. Different locales affect character mapping, sort order, and the like. Collation attributes are reflected in the names of various collations. For information about attributes, see the Babelfish collation attributes table.

Babelfish maps SQL Server collations to comparable collations provided by Babelfish. Babelfish predefines Unicode collations with culturally sensitive string comparisons and sort orders. Babelfish also provides a way to translate the collations in your SQL Server DB to the closest-matching Babelfish collation. Locale-specific collations are provided for different languages and regions.

Some collations specify a code page that corresponds to a client-side encoding. Babelfish automatically translates from the server encoding to the client encoding depending on the collation of each output column.

Babelfish supports the collations listed in the Babelfish supported collations table. Babelfish maps SQL Server collations to comparable collations provided by Babelfish.

Babelfish uses version 153.80 of the International Components for Unicode (ICU) collation library. For more information about ICU collations, see Collation in the ICU documentation. To learn more about PostgreSQL and collation, see Collation Support in the the PostgreSQL documentation.

Topics

DB cluster parameters that control collation and locale
Deterministic and nondeterministic collations in Babelfish
Collations supported at database level in Babelfish
Server and object Collations in Babelfish
Default Collation behavior in Babelfish
Managing collations
Collation limitations and behavior differences

DB cluster parameters that control collation and locale

The following parameters affect collation behavior.

babelfishpg_tsql.default_locale

This parameter specifies the default locale used by the collation. This parameter is used in combination with attributes listed in the Babelfish collation attributes table to customize collations for a specific language and region. The default value for this parameter is en-US.

The default locale applies to all Babelfish collation names that start with "BBF" and to all SQL Server collations that are mapped to Babelfish collations. Changing the setting for this parameter on an existing Babelfish DB cluster doesn't affect the locale of existing collations. For the list of collations, see the Babelfish supported collations table.

babelfishpg_tsql.server_collation_name

This parameter specifies the default collation for the server (Aurora PostgreSQL DB cluster instance) and the database. The default value is sql_latin1_general_cp1_ci_as. The server_collation_name has to be a CI_AS collation because in T-SQL, the server collation determines how identifiers are compared.

When you create your Babelfish DB cluster, you choose the Collation name from the selectable list. These include the collations listed in the Babelfish supported collations table. Don't modify the server_collation_name after the Babelfish database is created.

The settings you choose when you create your Babelfish for Aurora PostgreSQL DB cluster are stored in the DB cluster parameter group associated with the cluster for these parameters and set its collation behavior.

Deterministic and nondeterministic collations in Babelfish

Babelfish supports deterministic and nondeterministic collations:

A deterministic collation evaluates characters that have identical byte sequences as equal. That means that x and X aren't equal in a deterministic collation. Deterministic collations can be case-sensitive (CS) and accent-sensitive (AS).
A nondeterministic collation doesn't need an identical match. A nondeterministic collation evaluates x and X as equal. Nondeterministic collations are case-insensitive (CI) or accent-insensitive (AI), or both.

In the table following, you can find some behavior differences between Babelfish and PostgreSQL when using nondeterministic collations.

Babelfish	PostgreSQL
Supports the LIKE clause for CI_AS collations.	Doesn't support the LIKE clause on nondeterministic collations.
Supports the LIKE clause only on the following AI collations from Babelfish version 4.2.0: bbf_unicode_cp1250_ci_ai bbf_unicode_cp1250_cs_ai bbf_unicode_cp1257_ci_ai bbf_unicode_cp1257_cs_ai bbf_unicode_cp1_ci_ai bbf_unicode_cp1_cs_ai estonian_ci_ai finnish_swedish_ci_ai french_ci_ai latin1_general_ci_ai latin1_general_cs_ai modern_spanish_ci_ai polish_ci_ai sql_latin1_general_cp1_ci_ai sql_latin1_general_cp1_cs_ai traditional_spanish_ci_ai	Doesn't support the LIKE clause on nondeterministic collations.

Babelfish

PostgreSQL

Supports the LIKE clause for CI_AS collations.

Doesn't support the LIKE clause on nondeterministic collations.

Supports the LIKE clause only on the following AI collations from Babelfish version 4.2.0:

bbf_unicode_cp1250_ci_ai
bbf_unicode_cp1250_cs_ai
bbf_unicode_cp1257_ci_ai
bbf_unicode_cp1257_cs_ai
bbf_unicode_cp1_ci_ai
bbf_unicode_cp1_cs_ai
estonian_ci_ai
finnish_swedish_ci_ai
french_ci_ai
latin1_general_ci_ai
latin1_general_cs_ai
modern_spanish_ci_ai
polish_ci_ai
sql_latin1_general_cp1_ci_ai
sql_latin1_general_cp1_cs_ai
traditional_spanish_ci_ai

Doesn't support the LIKE clause on nondeterministic collations.

For a list of other limitations and behavior differences for Babelfish compared to SQL Server and PostgreSQL, see Collation limitations and behavior differences.

Babelfish and SQL Server follow a naming convention for collations that describe the collation attributes, as shown in the table following.

Attribute	Description
AI	Accent-insensitive.
AS	Accent-sensitive.
BIN2	BIN2 requests data to be sorted in code point order. Unicode code point order is the same character order for UTF-8, UTF-16, and UCS-2 encodings. Code point order is a fast deterministic collation.
CI	Case-insensitive.
CS	Case-sensitive.
PREF	To sort uppercase letters before lowercase letters, use a PREF collation. If comparison is case-insensitive, the uppercase version of a letter sorts before the lowercase version, if there is no other distinction. The ICU library supports uppercase preference with `colCaseFirst=upper`, but not for CI_AS collations. PREF can be applied only to `CS_AS` deterministic collations.

Collations supported at database level in Babelfish

The following collations are supported at database level in Babelfish:

bbf_unicode_bin2
bbf_unicode_cp1_ci_ai
bbf_unicode_cp1_ci_as
bbf_unicode_cp1250_ci_ai
bbf_unicode_cp1250_ci_as
bbf_unicode_cp1257_ci_ai
bbf_unicode_cp1257_ci_as
estonian_ci_ai
estonian_ci_as
finnish_swedish_ci_ai
finnish_swedish_ci_as
french_ci_ai
french_ci_as
latin1_general_bin2
latin1_general_ci_ai
latin1_general_ci_as
latin1_general_90_bin2
latin1_general_100_bin2
latin1_general_140_bin2
modern_spanish_ci_ai
modern_spanish_ci_as
polish_ci_ai
polish_ci_as
sql_latin1_general_cp1_ci_ai
sql_latin1_general_cp1_ci_as
sql_latin1_general_cp1250_ci_as
sql_latin1_general_cp1251_ci_as
sql_latin1_general_cp1257_ci_as
traditional_spanish_ci_ai
traditional_spanish_ci_as

Note

To use a different collation at the database level, ensure it matches the server-level collation. For more information, see Server and object Collations in Babelfish

Server and object Collations in Babelfish

Use the following collations as a server collation or an object collation.

Collation ID	Notes
bbf_unicode_general_ci_as	Supports case-insensitive comparison and the LIKE operator.
bbf_unicode_cp1_ci_as	Nondeterministic collation also known as CP1252.
bbf_unicode_CP1250_ci_as	Nondeterministic collation used to represent texts in Central European and Eastern European languages that use Latin script.
bbf_unicode_CP1251_ci_as	Nondeterministic collation for languages that use the Cyrillic script.
bbf_unicode_cp1253_ci_as	Nondeterministic collation used to represent modern Greek.
bbf_unicode_cp1254_ci_as	Nondeterministic collation that supports Turkish.
bbf_unicode_cp1255_ci_as	Nondeterministic collation that supports Hebrew.
bbf_unicode_cp1256_ci_as	Nondeterministic collation used to write languages that use Arabic script.
bbf_unicode_cp1257_ci_as	Nondeterministic collation used to support Estonian, Latvian, and Lithuanian languages.
bbf_unicode_cp1258_ci_as	Nondeterministic collation used to write Vietnamese characters.
bbf_unicode_cp874_ci_as	Nondeterministic collation used to write Thai characters.
sql_latin1_general_cp1250_ci_as	Nondeterministic single-byte character encoding used to represent Latin characters.
sql_latin1_general_cp1251_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1253_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1254_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1255_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1256_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1257_ci_as	Nondeterministic collation that supports Latin characters.
sql_latin1_general_cp1258_ci_as	Nondeterministic collation that supports Latin characters.
chinese_prc_ci_as	Nondeterministic collation that supports Chinese (PRC).
cyrillic_general_ci_as	Nondeterministic collation that supports Cyrillic.
finnish_swedish_ci_as	Nondeterministic collation that supports Finnish.
french_ci_as	Nondeterministic collation that supports French.
japanese_ci_as	Nondeterministic collation that supports Japanese. Supported in Babelfish 2.1.0 and higher releases.
korean_wansung_ci_as	Nondeterministic collation that supports Korean (with dictionary sort).
latin1_general_ci_as	Nondeterministic collation that supports Latin characters.
modern_spanish_ci_as	Nondeterministic collation that supports Modern Spanish.
polish_ci_as	Nondeterministic collation that supports Polish.
thai_ci_as	Nondeterministic collation that supports Thai.
traditional_spanish_ci_as	Nondeterministic collation that supports Spanish (traditional sort).
turkish_ci_as	Nondeterministic collation that supports Turkish.
ukrainian_ci_as	Nondeterministic collation that supports Ukrainian.
vietnamese_ci_as	Nondeterministic collation that supports Vietnamese.

You can use the following collations as object collations.

Dialect	Deterministic options	Nondeterministic options
Arabic	Arabic_CS_AS	Arabic_CI_AS Arabic_CI_AI
Arabic script	BBF_Unicode_CP1256_CS_AS BBF_Unicode_Pref_CP1256_CS_AS	BBF_Unicode_CP1256_CI_AI BBF_Unicode_CP1256_CS_AI
Binary	latin1_general_bin2 BBF_Unicode_BIN2	–
Central European and Eastern European languages that use Latin script	BBF_Unicode_CP1250_CS_AS BBF_Unicode_Pref_CP1250_CS_AS	BBF_Unicode_CP1250_CI_AI BBF_Unicode_CP1250_CS_AI
Chinese	Chinese_PRC_CS_AS	Chinese_PRC_CI_AS Chinese_PRC_CI_AI
Cyrillic_General	Cyrillic_General_CS_AS	Cyrillic_General_CI_AS Cyrillic_General_CI_AI
Cyrillic script	BBF_Unicode_CP1251_CS_AS BBF_Unicode_Pref_CP1251_CS_AS	BBF_Unicode_CP1251_CI_AI BBF_Unicode_CP1251_CS_AI
Estonian	Estonian_CS_AS	Estonian_CI_AS Estonian_CI_AI
Estonian, Latvian, and Lithuanian	BBF_Unicode_CP1257_CS_AS BBF_Unicode_Pref_CP1257_CS_AS	BBF_Unicode_CP1257_CI_AI BBF_Unicode_CP1257_CS_AI
Finnish_Swedish	Finnish_Swedish_CS_AS	Finnish_Swedish_CI_AS Finnish_Swedish_CI_AI
French	French_CS_AS	French_CI_AS French_CI_AI
Greek	Greek_CS_AS	Greek_CI_AS Greek_CI_AI
Hebrew	BBF_Unicode_CP1255_CS_AS BBF_Unicode_Pref_CP1255_CS_AS Hebrew_CS_AS	BBF_Unicode_CP1255_CI_AI BBF_Unicode_CP1255_CS_AI Hebrew_CI_AS Hebrew_CI_AI
Japanese (Babelfish 2.1.0 and higher)	Japanese_CS_AS	Japanese_CI_AI Japanese_CI_AS
Korean_Wamsung	Korean_Wamsung_CS_AS	Korean_Wamsung_CI_AS Korean_Wamsung_CI_AI
Latin characters for code page CP1252	latin1_general_cs_as BBF_Unicode_General_CS_AS BBF_Unicode_General_Pref_CS_AS BBF_Unicode_Pref_CP1_CS_AS BBF_Unicode_CP1_CS_AS	latin1_general_ci_as latin1_general_ci_ai latin1_general_cs_ai BBF_Unicode_General_CI_AI BBF_Unicode_General_CS_AI BBF_Unicode_CP1_CI_AI BBF_Unicode_CP1_CS_AI
Modern Greek	BBF_Unicode_CP1253_CS_AS BBF_Unicode_Pref_CP1253_CS_AS	BBF_Unicode_CP1253_CI_AI BBF_Unicode_CP1253_CS_AI
Modern_Spanish	Modern_Spanish_CS_AS	Modern_Spanish_CI_AS Modern_Spanish_CI_AI
Mongolian	Mongolian_CS_AS	Mongolian_CI_AS Mongolian_CI_AI
Polish	Polish_CS_AS	Polish_CI_AS Polish_CI_AI
Thai	BBF_Unicode_CP874_CS_AS BBF_Unicode_Pref_CP874_CS_AS Thai_CS_AS	BBF_Unicode_CP874_CI_AI BBF_Unicode_CP874_CS_AI Thai_CI_AS, Thai_CI_AI
Traditional_Spanish	Traditional_Spanish_CS_AS	Traditional_Spanish_CI_AS Traditional_Spanish_CI_AI
Turkish	BBF_Unicode_CP1254_CS_AS BBF_Unicode_Pref_CP1254_CS_AS Turkish_CS_AS	BBF_Unicode_CP1254_CI_AI BBF_Unicode_CP1254_CS_AI Turkish_CI_AS, Turkish_CI_AI
Ukranian	Ukranian_CS_AS	Ukranian_CI_AS Ukranian_CI_AI
Vietnamese	BBF_Unicode_CP1258_CS_AS BBF_Unicode_Pref_CP1258_CS_AS Vietnamese_CS_AS	BBF_Unicode_CP1258_CI_AI BBF_Unicode_CP1258_CS_AI Vietnamese_CI_AS Vietnamese_CI_AI

Default Collation behavior in Babelfish

Earlier, the default collation of the collatable datatypes was pg_catalog.default. The datatypes and the objects that depends on these datatypes follows cases-sensitive collation. This condition potentially impacts the T-SQL objects of the data set with case-insensitive collation. Starting with Babelfish 2.3.0, the default collation for the collatable data types (except TEXT and NTEXT) is the same as the collation in the babelfishpg_tsql.server_collation_name parameter. When you upgrade to Babelfish 2.3.0, the default collation is picked automatically at the time of DB cluster creation, which doesn't create any visible impact.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

DB cluster parameter group settings for Babelfish

Managing collations