Neptune Loader Command - Amazon Neptune

Neptune Loader Command

Loads data from an Amazon S3 bucket into a Neptune DB instance.

To load data, you must send an HTTP POST request to the https://your-neptune-endpoint:port/loader endpoint. The parameters for the loader request can be sent in the POST body or as URL-encoded parameters.

Important

The MIME type must be application/json.

The S3 bucket must be in the same AWS Region as the cluster.

Note

You can load encrypted data from Amazon S3 if it was encrypted using the Amazon S3 SSE-S3 mode. In that case, Neptune is able to impersonate your credentials and issue s3:getObject calls on your behalf.

You can also load encrypted data from Amazon S3 that was encrypted using the SSE-KMS mode, as long as your IAM role includes the necessary permissions to access AWS KMS. Without proper AWS KMS permissions, the bulk load operation fails and returns a LOAD_FAILED response.

Neptune does not currently support loading Amazon S3 data encrypted using the SSE-C mode.

You don't have to wait for one load job to finish before you start another one. Neptune can queue up as many as 64 jobs requests at a time, provided that their queueRequest parameters are all set to "TRUE". If you don't want a load job to be queued up, on the other hand, you can set its queueRequest parameter to "FALSE" (the default), so that the load jop will fail if another one is already in progress.

You can use the dependencies parameter to queue up a job that must only be run after specified previous jobs in the queue have completed successfully. If you do that and any of those specified jobs fails, your job will not be run and its status will be set to LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED.

Neptune Loader Request Syntax

{ "source" : "string", "format" : "string", "iamRoleArn" : "string", "mode": "NEW|RESUME|AUTO" "region" : "us-east-1", "failOnError" : "string", "parallelism" : "string", "parserConfiguration" : { "baseUri" : "http://base-uri-string", "namedGraphUri" : "http://named-graph-string" }, "updateSingleCardinalityProperties" : "string", "queueRequest" : "TRUE", "dependencies" : ["load_A_id", "load_B_id"] }

Neptune Loader Request Parameters

  • source   –   An Amazon S3 URI.

    The SOURCE parameter accepts an Amazon S3 URI that points to either a single file or a folder. If you specify a folder, Neptune loads every data file in the folder.

    The folder can contain multiple vertex files and multiple edge files.

    The URI can be in any of the following formats.

    • s3://bucket_name/object-key-name

    • https://s3.amazonaws.com/bucket_name/object-key-name

    • https://s3-us-east-1.amazonaws.com/bucket_name/object-key-name

  • format   –   The format of the data. For more information about data formats for the Neptune Loader command, see Using the Amazon Neptune Bulk Loader to Ingest Data.

    Allowed values: csv (Gremlin). ntriples, nquads, rdfxml, turtle (RDF).

  • iamRoleArn   –   The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

  • region   –   The region parameter must match the AWS Region of the cluster and the S3 bucket.

    Amazon Neptune is available in the following Regions:

    • US East (N. Virginia):   us-east-1

    • US East (Ohio):   us-east-2

    • US West (N. California):   us-west-1

    • US West (Oregon):   us-west-2

    • Canada (Central):   ca-central-1

    • South America (São Paulo):   sa-east-1

    • Europe (Stockholm):   eu-north-1

    • Europe (Ireland):   eu-west-1

    • Europe (London):   eu-west-2

    • Europe (Paris):   eu-west-3

    • Europe (Frankfurt):   eu-central-1

    • Middle East (Bahrain):   me-south-1

    • Asia Pacific (Hong Kong):   ap-east-1

    • Asia Pacific (Tokyo):   ap-northeast-1

    • Asia Pacific (Seoul):   ap-northeast-2

    • Asia Pacific (Singapore):   ap-southeast-1

    • Asia Pacific (Sydney):   ap-southeast-2

    • Asia Pacific (Mumbai):   ap-south-1

    • China (Ningxia):   cn-northwest-1

    • AWS GovCloud (US-West):   us-gov-west-1

    • AWS GovCloud (US-East):   us-gov-east-1

  • mode   –   The load job mode.

    Allowed values: RESUME, NEW, AUTO.

    Default value: AUTO

    • RESUME   –   In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.

      The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.

    • NEW   –   In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.

    • AUTO   –   In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in RESUME mode.

      If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in NEW mode.

  • failOnError   –   A flag to toggle a complete stop on an error.

    Allowed values: "TRUE", "FALSE".

    Default value: "TRUE".

    When this parameter is set to "FALSE", the loader tries to load all the data in the location specified, skipping any entries with errors.

    When this parameter is set to "TRUE", the loader stops as soon as it encounters an error. Data loaded up to that point persists.

  • parallelism   –   This is an optional parameter that can be set to reduce the number of threads used by the bulk load process.

    Allowed values:

    • LOW –   The number of threads used is the number of cores divided by 8.

    • MEDIUM –   The number of threads used is the number of cores divided by 2.

    • HIGH –   The number of threads used is the same as the number of cores.

    • OVERSUBSCRIBE –   The number of threads used is the number of cores multiplied by 2. If this value is used, the bulk loader takes up all available resources.

    Default value:: HIGH

  • parserConfiguration   –   An optional object with additional parser configuration values. Each of the child parameters is also optional:

    Name Example Value Description
    namedGraphUri http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph The default graph for all RDF formats when no graph is specified (for non-quads formats and NQUAD entries with no graph). The default is http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph
    baseUri http://aws.amazon.com/neptune/default The base URI for RDF/XML and Turtle formats. The default is http://aws.amazon.com/neptune/default.

    For more information, see SPARQL Default Graph and Named Graphs.

  • updateSingleCardinalityProperties   –   This is an optional parameter that controls how the bulk loader treats a new value for single-cardinality vertex or edge properties.

    Allowed values: "TRUE", "FALSE".

    Default value: "FALSE".

    By default, or when updateSingleCardinalityProperties is explicitly set to "FALSE", the loader treats a new value as an error, because it violates single cardinality.

    When updateSingleCardinalityProperties is set to "TRUE", on the other hand, the bulk loader replaces the existing value with the new one. If multiple edge or single-cardinality vertex property values are provided in the source file(s) being loaded, the final value at the end of the bulk load could be any one of those new values. The loader only guarantees that the existing value has been replaced by one of the new ones.

  • queueRequest   –   This is an optional flag parameter that indicates whether the load request can be queued up or not.

    You don't have to wait for one load job to complete before issuing the next one, because Neptune can queue up as many as 64 jobs at a time, provided that their queueRequest parameters are all set to "TRUE".

    If the queueRequest parameter is omitted or set to "FALSE", the load request will fail if another load job is already running.

    Allowed values: "TRUE", "FALSE".

    Default value: "FALSE".

  • dependencies   –   This is an optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.

    Neptune can queue up as many as 64 load requests at a time, if their queueRequest parameters are set to "TRUE". The dependencies parameter lets you make execution of such a queued request dependent on the successful completion of one or more specified previous requests in the queue.

    For example, if load Job-A and Job-B are independent of each other, but load Job-C needs Job-A and Job-B to be finished before it begins, proceed as follows:

    1. Submit load-job-A and load-job-B one after another in any order, and save their load-ids.

    2. Submit load-job-C with the load-ids of the two jobs in its dependencies field:

    "dependencies" : ["job_A_load_id", "job_B_load_id"]

    Because of the dependencies parameter, the bulk loader will not start Job-C until Job-A and Job-B have completed successfully. If either one of them fails, Job-C will not be executed, and its status will be set to LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED.

    You can set up multiple levels of dependency in this way, so that the failure of one job will cause all requests that are directly or indirectly dependent on it to be cancelled.

  • accessKey   –   [deprecated] An access key ID of an IAM role with access to the S3 bucket and data files.

    The iamRoleArn parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

    For more information, see Access keys (access key ID and secret access key).

  • secretKey   –   [deprecated] The iamRoleArn parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

    For more information, see Access keys (access key ID and secret access key).

Neptune Loader Response Syntax

{ "status" : "200 OK", "payload" : { "loadId" : "guid_as_string" } }

200 OK

Successfully started load job returns a 200 code.

Neptune Loader Errors

When an error occurs, a JSON object is returned in the BODY of the response. The message object contains a description of the error.

Error Categories

  • Error 400   –   Syntax errors return an HTTP 400 bad request error. The message describes the error.

  • Error 500   –   A valid request that cannot be processed returns an HTTP 500 internal server error. The message describes the error.

The following are possible error messages from the loader with a description of the error.

Loader Error Messages

  • Couldn't find the AWS credential for iam_role_arn  (HTTP 400)

    The credentials were not found. Verify the supplied credentials against the IAM console or AWS CLI output.

  • S3 bucket not found for source  (HTTP 400)

    The S3 bucket does not exist. Check the name of the bucket.

  • The source source-uri does not exist/not reachable  (HTTP 400)

    No matching files were found in the S3 bucket.

  • Unable to connect to S3 endpoint. Provided source = source-uri and region = aws-region  (HTTP 400)

    Unable to connect to Amazon S3. Region must match the cluster Region. Ensure that you have a VPC endpoint. For information about creating a VPC endpoint, see Creating an Amazon S3 VPC Endpoint.

  • Bucket is not in provided Region (aws-region)  (HTTP 400)

    The bucket must be in the same AWS Region as your Neptune DB instance.

  • Unable to perform S3 list operation  (HTTP 400)

    The IAM user or role provided does not have List permissions on the bucket or the folder. Check the policy or the access control list (ACL) on the bucket.

  • Start new load operation not permitted on a read replica instance  (HTTP 405)

    Loading is a write operation. Retry load on the read/write cluster endpoint.

  • Failed to start load because of unknown error from S3  (HTTP 500)

    Amazon S3 returned an unknown error. Contact AWS Support.

  • Invalid S3 access key  (HTTP 400)

    Access key is invalid. Check the provided credentials.

  • Invalid S3 secret key  (HTTP 400)

    Secret key is invalid. Check the provided credentials.

  • Max concurrent load limit breached  (HTTP 400)

    If a load request is submitted without "queueRequest" : "TRUE", and a load job is currently running, the request will fail with this error.

  • Failed to start new load for the source "source name". Max load task queue size limit breached. Limit is 64  (HTTP 400)

    Neptune supports queuing up as many as 64 loader jobs at a time. If an additional load request is submitted to the queue when it already contains 64 jobs, the request fails with this message.

Neptune Loader Examples

Example Request

The following is a request sent via HTTP POST using the curl command. It loads a file in the Neptune CSV format. For more information, see Gremlin Load Data Format.

curl -X POST \ -H 'Content-Type: application/json' \ https://your-neptune-endpoint:port/loader -d ' { "source" : "s3://bucket-name/object-key-name", "format" : "csv", "iamRoleArn" : "ARN for the IAM role you are using", "region" : "region", "failOnError" : ""FALSE"", "parallelism" : "MEDIUM", "updateSingleCardinalityProperties" : ""FALSE"", "queueRequest" : ""FALSE"" }'

Example Response

{ "status" : "200 OK", "payload" : { "loadId" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5" } }