Oracle and MySQL character sets
With AWS DMS, you can migrate databases between different platforms while preserving character set encodings and collations. Character sets define the encoding used to represent characters, while collations determine the sorting order and comparison rules. Properly configuring character sets and collations is crucial for applications that handle multilingual data or have specific sorting requirements.
Feature compatibility | AWS SCT / AWS DMS automation level | AWS SCT action code index | Key differences |
---|---|---|---|
|
|
N/A |
Different syntax. MySQL can have different collations for each database in the same instance. |
Oracle usage
Oracle supports most national and international encoded character set standards including extensive support for Unicode.
Oracle provides two scalar string-specific data types:
-
VARCHAR2
— Stores variable-length character strings with a length between 1 and 4000 bytes. The Oracle database can be configured to use theVARCHAR2
data type to store either Unicode or Non-Unicode characters. -
NVARCHAR2
— Scalar data type used to store Unicode data. Supports AL16UTF16 or UTF8 and id specified during database creation.
Character sets in Oracle are defined at the instance level (Oracle 11g) or the pluggable database level (Oracle 12c R2). In Pre-12cR2 Oracle databases, the character set for the root container and all pluggable databases were required to be identical.
Oracle 18c updates AL32UTF8 and AL16UTF16 character sets to Unicode standard version 9.0.
UTF8 Unicode
In Oracle, you can use the AL32UTF8 character set. Oracle provides encoding of ASCII characters as single-byte for Latin characters, two-bytes for some European and Middle-Eastern languages, and three-bytes for certain South and East-Asian characters. Therefore, Unicode storage requirements are usually higher when compared non-Unicode character sets.
Character set migration
Two options exist for modifying existing Instance-level or database-level character sets:
-
Export or import from the source Instance/PDB to a new Instance/PDB with a modified character set.
-
Use the Database Migration Assistant for Unicode (DMU), which simplifies the migration process to the Unicode character set.
As of 2012, use of the CSALTER
utility for character set migrations is deprecated.
Oracle Database 12c Release 1 (12.1.0.1) complies with version 6.1 of the Unicode standard.
Oracle Database 12c Release 2 (12.1.0.2) extends the compliance to version 6.2 of the Unicode standard.
UTF-8 is supported through the AL32UTF8 CS and is valid as both the client and database character sets.
UTF-16BE is supported through AL16UTF16 and is valid as the national (NCHAR) character set.
For more information, see Choosing a Character Set
MySQL usage
MySQL supports a variety of different character sets including support for both single-byte and multi-byte languages. The default character set is specified when initializing a MySQL database cluster with initdb
. Each individual database created on the MySQL cluster supports individual character sets defined as part of database creation.
To query the available character sets, use the INFORMATION_SCHEMA CHARACTER_SETS
table or the SHOW CHARACTER SET
statement.
All character sets have at least one collation, and most character sets have more. To list the display collations for a character set, use the INFORMATION_SCHEMA COLLATIONS
table or the SHOW COLLATION
statement.
Collations have these general characteristics:
-
Two different character sets cannot have the same collation.
-
Each character set has a default collation.
-
Collation names start with the name of the character set with which they are associated and are generally followed by one or more suffixes indicating other collation characteristics.
Examples
Create a database named test01 which uses the Korean EUC_KR Encoding the and the ko_KR locale.
CREATE DATABASE test01 CHARACTER SET = euckr COLLATE = euckr_korean_ci;
View the character sets configured for each database by querying the System Catalog.
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
Convert a character set and collation using the ALTER DATABASE
command.
ALTER DATABASE test01 CHARACTER SET = ucs2 COLLATE = ucs2_general_ci;
MySQL supports conversion of character sets between server and client for specific character set combinations with the parameter character_set_client
and character_set_connection
. For more information, see Connection Character Sets and Collations
In MySQL, you can specify the sort order and character classification behavior on a per-column level. Specify specific collations for individual table columns.
CREATE TABLE lang( latin1_col CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci, latin2_col CHAR(10) CHARACTER SET latin2);
Summary
Feature | Oracle | Aurora MySQL |
---|---|---|
View database character set |
SELECT * FROM NLS_DATABASE_PARAMETERS; |
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA; |
Modify the database character set |
Choose one of the following options:
|
ALTER DATABASE test01 CHARACTER SET = ucs2 COLLATE = ucs2_general_ci; |
Character set granularity |
Instance (11g + 12cR1) Database (Oracle 12cR2) |
Column |
UTF8 |
Supported by using |
Supported by using |
UTF16 |
Supported by using |
Supported by using |
|
Supported |
Supported |
For more information, see Character Sets, Collations, Unicode