Menu
Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

System Tables for Troubleshooting Data Loads

The following Amazon Redshift system tables can be helpful in troubleshooting data load issues:

  • Query STL_LOAD_ERRORS to discover the errors that occurred during specific loads.

  • Query STL_FILE_SCAN to view load times for specific files or to see if a specific file was even read.

To find and diagnose load errors

  1. Create a view or define a query that returns details about load errors. The following example joins the STL_LOAD_ERRORS table to the STV_TBL_PERM table to match table IDs with actual table names.

    Copy
    create view loadview as (select distinct tbl, trim(name) as table_name, query, starttime, trim(filename) as input, line_number, colname, err_code, trim(err_reason) as reason from stl_load_errors sl, stv_tbl_perm sp where sl.tbl = sp.id);
  2. Set the MAXERRORS option in your COPY command to a large enough value to enable COPY to return useful information about your data. If the COPY encounters errors, an error message directs you to consult the STL_LOAD_ERRORS table for details.

  3. Query the LOADVIEW view to see error details. For example:

    Copy
    select * from loadview where table_name='venue';
    Copy
    tbl | table_name | query | starttime --------+------------+-------+---------------------------- 100551 | venue | 20974 | 2013-01-29 19:05:58.365391 | input | line_number | colname | err_code | reason +----------------+-------------+-------+----------+--------------------- | venue_pipe.txt | 1 | 0 | 1214 | Delimiter not found
  4. Fix the problem in the input file or the load script, based on the information that the view returns. Some typical load errors to watch for include:

    • Mismatch between data types in table and values in input data fields.

    • Mismatch between number of columns in table and number of fields in input data.

    • Mismatched quotes. Amazon Redshift supports both single and double quotes; however, these quotes must be balanced appropriately.

    • Incorrect format for date/time data in input files.

    • Out-of-range values in input files (for numeric columns).

    • Number of distinct values for a column exceeds the limitation for its compression encoding.