Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

Loading Multibyte Data from Amazon S3

If your data includes non-ASCII multibyte characters (such as Chinese or Cyrillic characters), you must load the data to VARCHAR columns. The VARCHAR data type supports four-byte UTF-8 characters, but the CHAR data type only accepts single-byte ASCII characters. You cannot load five-byte or longer characters into Amazon Redshift tables. For more information about CHAR and VARCHAR, see Data Types.

To check which encoding an input file uses, use the Linux file command:

$ file ordersdata.txt ordersdata.txt: ASCII English text $ file uni_ordersdata.dat uni_ordersdata.dat: UTF-8 Unicode text