Amazon Neptune
User Guide (API Version 2017-11-29)

Gremlin Load Data Format

To load Apache TinkerPop Gremlin data using the CSV format, you must specify the vertices and the edges in separate files.

The loader can load from multiple vertex files and multiple edge files in a single load job.

For each load command, the set of files to be loaded must be in the same folder in the Amazon S3 bucket, and you specify the folder name for the source parameter. The file names and file name extensions are not important.

The Amazon Neptune CSV format follows the RFC 4180 CSV specification. For more information, see Common Format and MIME Type for CSV Files on the Internet Engineering Task Force (IETF) website.

Note

All files must be encoded in UTF-8 format.

Each file has a comma-separated header row. The header row consists of both system column headers and property column headers.

System Column Headers

The required and allowed system column headers are different for vertex files and edge files.

Each system column can appear only once in a header.

All labels are case sensitive.

Vertex headers

  • ~id - Required

    An ID for the vertex.

  • ~label

    A label for the vertex. Multiple label values are allowed. Separate values with a semicolon (;) character.

Edge headers

  • ~id - Required

    An ID for the edge.

  • ~from - Required

    The vertex ID of the from vertex.

  • ~to - Required

    The vertex ID of the to vertex.

  • ~label

    A label for the edge. Edges can only have a single label.

Property Column Headers

You can specify a column for a property by using the following syntax. The type names are not case sensitive.

propertyname:type

Note

Spaces are not allowed in the column headers, so property names cannot include spaces.

You can specify a column for an array type by adding [] to the type.

propertyname:type[]

Note

Edge properties can only have a single value and will cause an error if an array type is specified or a second value is specified.

The following example shows the column header for a property named age of type Int.

age:Int

Every row in the file would be required to have an integer in that position or be left empty.

Note

Edge properties can only have a single value and will cause an error if a

Arrays of strings are allowed, but strings in an array must not include the semicolon (;) character.

The following section lists all the available data types.

Data Types

This is a list of the allowed property types, with a description of each type.

Bool (or Boolean)

Indicates a Boolean field. Allowed values: false, true

Note

Any value other than true will be treated as false.

Whole Number Types

Values outside of the defined ranges result in an error.

Type Range
Byte -127 to 126
Short -32768 to 32767
Int -2^31 to 2^31-1
Long -2^63 to 2^63-1

Decimal Number Types

Supports both decimal notation or scientific notation. Also allows symbols such as (+/-) INFINITY or NaN. INF is not supported.

Type Range
Float 32-bit IEEE 754 floating point
Double 64-bit IEEE 754 floating point

Float and double values that are too long are loaded and rounded to the nearest value for 24-bit (float) and 53-bit (double) precision. A midway value is rounded to 0 for the last remaining digit at the bit level.

String

Quotation marks are optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks ("). Example: "Hello, World"

To include quotation marks in a quoted string, you can escape the quotation mark by using two in a row: Example: "Hello ""World"""

Arrays of strings are allowed, but strings in an array must not include the semicolon (;) character.

If you want to surround strings in an array with quotation marks, you must surround the whole array with one set of quotation marks. Example: "String one; String 2; String 3"

Date

Java date in ISO-8601 format. Supports the following formats: YYYY-MM-DD, YYYY-MM-DDTHH:mm, YYYY-MM-DDTHH:mm:SS, YYYY-MM-DDTHH:mm:SSZ

Row format

Delimiters

Fields in a row are separated by a comma. Records are separated by a newline or a newline followed by a carriage return.

Blank Fields

Blank fields are allowed for non-required columns (such as user-defined properties). A blank field still requires a comma separator. The example in the next section has a blank field in each example vertex.

Vertex IDs

~id values must be unique for all vertices in every vertex file. Multiple vertex rows with identical ~id values are applied to a single vertex in the graph.

Edge IDs

Additionally, ~id values must be unique for all edges in every edge file. Multiple edge rows with identical ~id values are applied to the single edge in the graph.

Labels

Labels are case sensitive.

String Values

Quotation marks are optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks (").

CSV Specification

The Neptune CSV format follows the RFC 4180 CSV specification, including the following requirements.

  • Both Unix and Windows style line endings are supported (\n or \r\n).

  • Any field can be quoted (using double quotation marks).

  • Fields containing a line-break, double-quote, or commas must be quoted. (If they are not, load aborts immediately.)

  • A double quotation mark character (") in a field must be represented by two (double) quotation mark characters. For example, a string Hello "World" must be present as "Hello ""World""" in the data.

  • Surrounding spaces between delimiters are ignored. If a row is present as value1, value2, they are stored as "value1" and "value2".

  • Any other escape characters are stored verbatim. For example, "data1\tdata2" is stored as "data1\tdata2". No further escaping is needed as long as these characters are enclosed within quotation marks.

  • Blank fields are allowed. A blank field is considered an empty value.

  • Multiple values for a field are specified with a semicolon (;) between values.

For more information, see Common Format and MIME Type for CSV Files on the Internet Engineering Task Force (IETF) website.

Example

The following diagram shows an example of two vertices and an edge taken from the TinkerPop Modern Graph.


                        Diagram depicting two vertices and an edge, contains marko age 29
                            and lop software with lang: java.

The following is the graph in Neptune CSV load format.

Vertex file:

~id, name:String, age:Int, lang:String, ~label v1, "marko", 29, , person v2, "lop", , "java", software

Tabular view of the vertex file:

~id name:String age:Int lang:String ~label
v1 "marko" 29 person
v2 "lop" "java" software

Edge file:

~id, ~from, ~to, ~label, weight:Double e1, v1, v2, created, 0.4

Tabular view of the edge file:

~id ~from ~to ~label weight:Double
e1 v1 v2 created 0.4

Next Steps

Now that you know more about the loading formats, see Example: Loading Data into a Neptune DB Instance.