Amazon Neptune
User Guide (API Version 2017-11-29)

Neptune Loader Command

Loads data from an Amazon S3 bucket into a Neptune DB instance.

To load data, you must send an HTTP POST request to the https://your-neptune-endpoint:port/loader endpoint. The parameters for the loader request can be sent in the POST body or as URL-encoded parameters.

Important

The MIME type must be application/json.

The S3 bucket must be in the same AWS Region as the cluster.

Note

You can load encrypted data from Amazon S3 if it was encrypted using the Amazon S3 SSE-S3 mode. In that case, Neptune is able to impersonate your credentials and issue s3:getObject calls on your behalf. However, Neptune does not currently support loading data encrypted using the SSE-KMS or SSE-C modes.

Neptune Loader Request Syntax

{ "source" : "string", "format" : "string", "iamRoleArn" : "string", "mode": "NEW|RESUME|AUTO" "region" : "us-east-1", "failOnError" : "string", "parallelism" : "string", "parserConfiguration" : { "baseUri" : "https://base-uri-string", "namedGraphUri" : "https://named-graph-string" } }

Neptune Loader Request Parameters

source

An Amazon S3 URI.

The SOURCE parameter accepts an Amazon S3 URI that points to either a single file or a folder. If you specify a folder, Neptune loads every data file in the folder.

The folder can contain multiple vertex files and multiple edge files.

The URI can be in any of the following formats.

  • s3://bucket_name/object-key-name

  • https://s3.amazonaws.com/bucket_name/object-key-name

  • https://s3-us-east-1.amazonaws.com/bucket_name/object-key-name

format

The format of the data. For more information about data formats for the Neptune Loader command, see Loading Data into Amazon Neptune.

Allowed values: csv (Gremlin). ntriples, nquads, rdfxml, turtle (RDF)

iamRoleArn

The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

region

The region parameter must match the AWS Region of the cluster and the S3 bucket.

Amazon Neptune is available in the following Regions:

  • us-east-1 - US East (N. Virginia)

  • us-east-2 - US East (Ohio)

  • us-west-2 - US West (Oregon)

  • eu-west-1 - EU (Ireland)

  • eu-west-2 - EU (London)

  • eu-central-1 - EU (Frankfurt)

  • ap-southeast-1 - Asia Pacific (Singapore)

  • ap-southeast-2 - Asia Pacific (Sydney)

  • ap-northeast-1 - Asia Pacific (Tokyo)

  • ap-south-1 - Asia Pacific (Mumbai)

  • ap-northeast-2 - Asia Pacific (Seoul)

mode

The load job mode.

RESUME mode determines whether there is a previous load for the source and resumes the load if one is found. If a previous load is not found, the load is stopped. The loader avoids reloading the files that successfully completed in a previous load. It only tries to process the failed files. If you dropped the previously loaded data from your Neptune cluster, the data is not reloaded in this mode. In the special case where the previous loads from the same source completed successfully, the new load is completed with nothing reloaded.

NEW mode creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping the previously loaded data from your Neptune cluster or to load new data available at the same source.

AUTO mode determines whether there is a previous load with the same source. It resumes the load if one is found, just like RESUME mode. If a previous load is not found, it loads data from the source, just like NEW mode.

Default: AUTO

Allowed values: NEW, RESUME, AUTO.

failOnError

A flag to toggle a complete stop on an error.

When set to FALSE, the bulk loader tries to load all the data in the location specified and skips any entries with errors.

When set to TRUE, the bulk loader aborts if it encounters an error, and data loaded up to that point persists.

Default: TRUE

Allowed values: TRUE, FALSE

parallelism

This is an optional parameter that can be set to reduce the number of threads used by the bulk load process.

Default: HIGH

Allowed values: LOW, MEDIUM, and HIGH.

parserConfiguration

An optional object with additional parser configuration values. Each child parameter is also optional.

Name Example Value Description
namedGraphUri https://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph The default graph for all RDF formats when no graph is specified (for non-quads formats and NQUAD entries with no graph). The default is https://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph
baseUri https://aws.amazon.com/neptune/default The base URI for RDF/XML and Turtle formats. The default is https://aws.amazon.com/neptune/default.

For more information, see SPARQL Default Graph and Named Graphs.

[deprecated] accessKey

The iamRoleArn parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

An access key ID of an IAM role with access to the S3 bucket and data files.

For more information, see Access keys (access key ID and secret access key).

[deprecated] secretKey

The iamRoleArn parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

For more information, see Access keys (access key ID and secret access key).

Neptune Loader Response Syntax

{ "status" : "200 OK", "payload" : { "loadId" : "guid_as_string" } }

200 OK

Successfully started load job returns a 200 code.

Neptune Loader Errors

When an error occurs, a JSON object is returned in the BODY of the response. The message object contains a description of the error.

Error 400

Syntax errors return a 400 bad request error. The message describes the error.

Error 500

A valid request that cannot be processed returns a 500 internal server error. The message describes the error.

Neptune Loader Error Messages

The following are possible error messages from the loader with a description of the error.

Max concurrent load limit breached (HTTP 400)

You can only have 1 load job at a time.

Couldn't find the AWS credential for iam_role_arn (HTTP 400)

The credentials were not found. Verify the supplied credentials against the IAM console or AWS CLI output.

S3 bucket not found for source (HTTP 400)

The S3 bucket does not exist. Check the name of the bucket.

The source source-uri does not exist/not reachable (HTTP 400)

No matching files were found in the S3 bucket.

Unable to connect to S3 endpoint. Provided source = source-uri and region = aws-region (HTTP 400)

Unable to connect to Amazon S3. Region must match the cluster Region. Ensure that you have a VPC endpoint. For information about creating a VPC endpoint, see Creating an Amazon S3 VPC Endpoint.

Bucket is not in provided Region (aws-region) (HTTP 400)

The bucket must be in the same AWS Region as your Neptune DB instance.

Unable to perform S3 list operation (HTTP 400)

The IAM user or role provided does not have List permissions on the bucket or the folder. Check the policy or the access control list (ACL) on the bucket.

Start new load operation not permitted on a read replica instance (HTTP 405)

Loading is a write operation. Retry load on the read/write cluster endpoint.

Failed to start load because of unknown error from S3 (HTTP 500)

Amazon S3 returned an unknown error. Contact AWS Support.

Invalid S3 access key (HTTP 400)

Access key is invalid. Check the provided credentials.

Invalid S3 secret key (HTTP 400)

Secret key is invalid. Check the provided credentials.

Neptune Loader Examples

Example Request

The following is a request sent via HTTP POST using the curl command. It loads a file in the Neptune CSV format. For more information, see Gremlin Load Data Format.

curl -X POST \ -H 'Content-Type: application/json' \ https://your-neptune-endpoint:port/loader -d ' { "source" : "s3://bucket-name/object-key-name", "format" : "csv", "iamRoleArn" : "ARN for the IAM role you are using", "region" : "region", "failOnError" : "FALSE" }'

Example Response

{ "status" : "200 OK", "payload" : { "loadId" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5" } }