Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

Multibyte Character Load Errors

Columns with a CHAR data type only accept single-byte UTF-8 characters, up to byte value 127, or 7F hex, which is also the ASCII character set. VARCHAR columns accept multibyte UTF-8 characters, to a maximum of four bytes. For more information, see Character Types.

If a line in your load data contains a character that is invalid for the column data type, COPY returns an error and logs a row in the STL_LOAD_ERRORS system log table with error number 1220. The ERR_REASON field includes the byte sequence, in hex, for the invalid character.

An alternative to fixing invalid characters in your load data is to replace the invalid characters during the load process. To replace invalid UTF-8 characters, specify the ACCEPTINVCHARS option with the COPY command. For more information, see ACCEPTINVCHARS.

The following example shows the error reason when COPY attempts to load UTF-8 character e0 a1 c7a4 into a CHAR column:

Multibyte character not supported for CHAR (Hint: Try using VARCHAR). Invalid char: e0 a1 c7a4

If the error is related to a VARCHAR datatype, the error reason includes an error code as well as the invalid UTF-8 hex sequence. The following example shows the error reason when COPY attempts to load UTF-8 a4 into a VARCHAR field:

String contains invalid or unsupported UTF-8 codepoints. Bad UTF-8 hex sequence: a4 (error 3)

The following table lists the descriptions and suggested workarounds for VARCHAR load errors. If one of these errors occurs, replace the character with a valid UTF-8 code sequence or remove the character.

Error code Description
1 The UTF-8 byte sequence exceeds the four-byte maximum supported by VARCHAR.
2 The UTF-8 byte sequence is incomplete. COPY did not find the expected number of continuation bytes for a multibyte character before the end of the string.
3 The UTF-8 single-byte character is out of range. The starting byte must not be 254, 255 or any character between 128 and 191 (inclusive).
4 The value of the trailing byte in the byte sequence is out of range. The continuation byte must be between 128 and 191 (inclusive).
5 The UTF-8 character is reserved as a surrogate. Surrogate code points (U+D800 through U+DFFF) are invalid.
6 The character is not a valid UTF-8 character (code points 0xFDD0 to 0xFDEF).
7 The character is not a valid UTF-8 character (code points 0xFFFE and 0xFFFF).
8 The byte sequence exceeds the maximum UTF-8 code point.
9 The UTF-8 byte sequence does not have a matching code point.