Neptune Loader Command
Loads data from an Amazon S3 bucket into a Neptune DB instance.
To load data, you must send an HTTP POST
request to the
http://
endpoint. The parameters for the your-neptune-endpoint
:8182/loaderloader
request can be sent in
the POST
body or as URL-encoded parameters.
Important
The MIME type must be application/json
.
The S3 bucket must be in the same AWS Region as the cluster.
Note
You can load encrypted data from Amazon S3 if it was encrypted using the Amazon S3
SSE-S3
mode. In that case, Neptune is able to impersonate your
credentials and issue s3:getObject
calls on your behalf. However,
Neptune does not currently support loading data encrypted using the SSE-KMS
or SSE-C
modes.
Neptune Loader Request Syntax
{ "source" : "
string
", "format" : "string
", "iamRoleArn" : "string
", "mode": "NEW|RESUME|AUTO
" "region" : "us-east-1
", "failOnError" : "string
", "parserConfiguration" : { "baseUri" : "http://base-uri-string
", "namedGraphUri" : "http://named-graph-string
" } }
Neptune Loader Request Parameters
source
An Amazon S3 URI.
The SOURCE
parameter accepts an Amazon S3 URI that points to either a single
file or a folder. If you specify a folder, Neptune loads every data file in the folder.
The folder may contain multiple vertex files and multiple edge files.
The URI can be in any of the following formats.
-
s3://
bucket_name
/object-key-name
-
https://s3.amazonaws.com/
bucket_name
/object-key-name
-
https://s3-us-east-1.amazonaws.com/
bucket_name
/object-key-name
format
The format of the data. For more information about data formats for the Neptune
Loader
command, see Using the Bulk Loader to Load Data into Amazon Neptune.
Allowed values: csv
(Gremlin).
ntriples
, nquads
, rdfxml
, turtle
(RDF)
iamRoleArn
The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.
region
The region parameter must match the region of the cluster and the S3 bucket.
Amazon Neptune is available in the following regions:
-
us-east-1 - US East (N. Virginia)
-
us-east-2 - US East (Ohio)
-
us-west-2 - US West (Oregon)
-
eu-west-1 - EU (Ireland)
-
eu-west-2 - EU (London)
-
eu-central-1 - EU (Frankfurt)
-
ap-southeast-1 - Asia Pacific (Singapore)
-
ap-southeast-2 - Asia Pacific (Sydney)
-
ap-northeast-1 - Asia Pacific (Tokyo)
mode
Load job mode.
RESUME
mode determines whether there is a previous load for the source
and resumes the load if one is found. If a previous load is not found, the load is
aborted. The loader avoids reloading the files that successfully completed in previous
load and only attempts to process the failed files, in case you dropped the previously
loaded data from your Neptune cluster, they will not be reloaded in this mode. In
the
special case where the previous loads from the same source completed successfully,
the
new load is completed with nothing reloaded.
NEW
mode creates a new load request regardless of any previous loads.
This mode may be used to reload all the data from a source after dropping the previously
loaded data from your Neptune cluster or to load new data available at the same
source.
AUTO
mode determines whether there is a previous load with the same
source, and resumes the load if one is found just like RESUME
mode. If a
previous load is not found, then loads data from the source just like NEW
mode.
Default: AUTO
Allowed values: NEW
, RESUME
,
AUTO
.
failOnError
Flag to toggle a complete stop on an error.
When set to FALSE
, the bulk loader attempts to load all the data in the
location specified and skips any entries with errors.
When set to TRUE
, the bulk loader aborts if it encounters an error,
data loaded up to that point persists.
Default: TRUE
Allowed values: TRUE
, FALSE
parserConfiguration
An optional object with additional parser configuration values. Each child parameter is also optional.
Name | Example Value | Description |
---|---|---|
namedGraphUri |
http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph |
The default graph for all RDF formats when no graph is specified (for non-quads formats and NQUAD entries with no graph). Default is http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph |
baseUri |
http://aws.amazon.com/neptune/default |
The base uri for RDF/XML and Turtle formats. Default is http://aws.amazon.com/neptune/default |
For more information, see SPARQL Default Graph and Named Graphs.
[deprecated] accessKey
The iamRoleArn parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.
An access key ID of an IAM role with access to the S3 bucket and data files.
For more information, see Access keys (access key ID and secret access key).
[deprecated] secretKey
The iamRoleArn parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.
For more information, see Access keys (access key ID and secret access key).
Neptune Loader Response Syntax
{ "status" : "200 OK", "payload" : { "loadId" : "
guid_as_string
" } }
200 OK
Successfully started load job returns a 200
code.
Loader Errors
When an error occurs, a JSON object is returned in the BODY
of the
response. The message
object contains a description of the error.
Error 400
Syntax errors return a 400
bad request error. The message describes the
error.
Error 500
A valid request that cannot be processed returns a 500
internal server
error. The message describes the error.
Neptune Loader Error Messages
The following are possible error messages from the loader with a description of the error.
Max concurrent load limit breached (HTTP 400)
You can only have 1 load job at a time.
Couldn't find the AWS credential for iam_role_arn (HTTP 400)
The credentials were not found. Verify the supplied credentials against the IAM console or AWS CLI output.
S3 bucket not found for source (HTTP 400)
The S3 bucket does not exist. Check the name of the bucket.
The source source-uri
does not exist/not reachable
(HTTP 400)
No matching files were found in the S3 bucket.
Unable to connect to S3 endpoint. Provided source =
source-uri
and region =
aws-region
(HTTP 400)
Unable to connect to Amazon S3. Region must match the cluster region. Ensure that you have a VPC endpoint. For information about creating a VPC endpoint, see Creating an Amazon S3 VPC Endpoint.
Bucket is not in provided region (aws-region
) (HTTP
400)
The bucket must be in the same AWS Region as your Neptune DB instance.
Unable to perform S3 list operation (HTTP 400)
The IAM user or role provided does not have List
permissions on the
bucket or the folder. Check the policy and/or the access control list (ACL) on the
bucket.
Start new load operation not permitted on a read-replica instance (HTTP 405)
Loading is a write operation. Retry load on the read/write cluster endpoint.
Failed to start load because of unknown error from S3 (HTTP 500)
Amazon S3 returned an unknown error. Contact AWS Support.
Invalid S3 access key (HTTP 400)
Access key is invalid. Check the provided credentials.
Invalid S3 secret key (HTTP 400)
Secret key is invalid. Check the provided credentials.
Neptune Loader Examples
Example Request
The following is a request sent via HTTP POST using the curl
command.
It loads a file in the Neptune CSV format. For more information, see Gremlin Load Data Format.
curl -X POST \ -H 'Content-Type: application/json' \ http://
your-neptune-endpoint
:8182/loader -d ' { "source" : "s3://bucket-name
/object-key-name
", "format" : "csv", "accessKey" : "access-key-id
", "secretKey" : "secret-key
", "region" : "region
", "failOnError" : "FALSE
" }'
Example Response
{ "status" : "200 OK", "payload" : { "loadId" : "
ef478d76-d9da-4d94-8ff1-08d9d4863aa5
" } }