Amazon Neptune
User Guide (API Version 2017-11-29)

Example: Loading Data into a Neptune DB Instance

This example shows how to load data into Amazon Neptune. Unless stated otherwise, you must follow these steps from an Amazon Elastic Compute Cloud (Amazon EC2) instance in the same Amazon Virtual Private Cloud (VPC) as your Neptune DB instance.

Prerequisites

Before you begin, you must have the following:

  • A Neptune DB instance.

    For information about launching a Neptune DB instance, see Getting Started with Neptune.

  • An Amazon Simple Storage Service (Amazon S3) bucket to put the data files in.

    You can use an existing bucket. If you don't have an S3 bucket, see Create a Bucket in the Amazon S3 Getting Started Guide.

  • An IAM role for the Neptune DB instance to assume that has an IAM policy that allows access to the data files in the S3 bucket. The policy must grant Read and List permissions.

    For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

    Note

    The Neptune Load API needs read access to the data files only. The IAM policy doesn't need to allow write access or access to the entire bucket.

  • An Amazon S3 VPC endpoint. For more information, see the Amazon S3 VPC Endpoint section.

Amazon S3 VPC Endpoint

The Neptune loader requires a VPC endpoint for Amazon S3.

To set up access for Amazon S3

  1. Sign in to the AWS Management Console and open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

  2. In the left navigation pane, choose Endpoints.

  3. Choose Create Endpoint.

  4. Choose the Service Name com.amazonaws.region.s3.

    Note

    If the region here is incorrect, make sure the console region is correct.

  5. Choose the VPC that contains your Neptune DB instance.

  6. Select the check box next to the route tables that are associated with the subnets related to your cluster. If you only have one route table, you must select that box.

  7. Choose Create Endpoint.

For information about creating the endpoint, see VPC Endpoints in the Amazon VPC User Guide. For information about the limitations of VPC endpoints, VPC Endpoints for Amazon S3.

To load data into a Neptune DB instance

  1. Copy the data files to an Amazon S3 bucket. The S3 bucket must be in the same AWS Region as the cluster that loads the data.

    You can use the following AWS CLI command to copy the files to the bucket.

    Note

    This command does not need to be run from the Amazon EC2 instance.

    aws s3 cp data-file-name s3://bucket-name/object-key-name

    Note

    In Amazon S3, an object key name is the entire path of a file, including the file name.

    Example: In the command aws s3 cp datafile.txt s3://examplebucket/mydirectory/datafile.txt, the object key name is mydirectory/datafile.txt.

    Alternatively, you can use the AWS Management Console to upload files to the S3 bucket. Open the Amazon S3 console at https://console.aws.amazon.com/s3/, and choose a bucket. In the upper-left corner, choose Upload to upload files.

  2. From a command line window, type the following to run the Neptune loader, replacing the values for the endpoint, Amazon S3 path, format, and access keys.

    The format parameter can be any of the following values: csv (Gremlin), ntriples, nquads, turtle, and rdfxml (RDF). For information about the other parameters, see Loader Command.

    For information about finding the hostname of your Neptune DB instance, see the Amazon Neptune Endpoints section.

    The region parameter must match the region of the cluster and the S3 bucket.

    Amazon Neptune is available in the following regions:

    • us-east-1 - US East (N. Virginia)

    • us-east-2 - US East (Ohio)

    • us-west-2 - US West (Oregon)

    • eu-west-1 - EU (Ireland)

    • eu-west-2 - EU (London)

    • eu-central-1 - EU (Frankfurt)

    curl -X POST \ -H 'Content-Type: application/json' \ http://your-neptune-endpoint:8182/loader -d ' { "source" : "s3://bucket-name/object-key-name", "format" : "format", "iamRoleArn" : "arn:aws:iam::account-id:role/role-name", "region" : "region", "failOnError" : "FALSE" }'

    For information about creating and associating an IAM role with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

    Note

    The source parameter accepts an Amazon S3 URI that points to either a single file or a folder. If you specify a folder, Neptune loads every data file in the folder.

    The folder may contain multiple vertex files and multiple edge files.

    The URI can be in any of the following formats.

    • s3://bucket_name/object-key-name

    • https://s3.amazonaws.com/bucket_name/object-key-name

    • https://s3-us-east-1.amazonaws.com/bucket_name/object-key-name

    The format parameter can be one of the following:

    • CSV format (csv) for property graph / Gremlin

    • N -Triples (ntriples) format for RDF / SPARQL

    • N-Quads (nquads) format for RDF / SPARQL

    • RDF/XML (rdfxml) format for RDF / SPARQL

    • Turtle (turtle) format for RDF / SPARQL

  3. The Neptune loader returns a job id that allows you to check the status or cancel the loading process; for example:

    { "status" : "200 OK", "payload" : { "loadId" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5" } }
  4. Type the following to get the status of the load with the loadId from Step 3:

    curl -G 'http://your-neptune-endpoint:8182/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5'

    If the status of the load lists an error, you can request more detailed status and a list of the errors. For more information and examples, see Loader Get Status .

  5. (Optional) Cancel the Load job.

    Type the following to Delete the loader job with the job id from Step 3:

    curl -X DELETE 'http://your-neptune-endpoint:8182/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5'

    The DELETE command returns the HTTP code 200 OK upon successful cancellation.

    The data from files from the load job that has finished loading is not rolled back. The data remains in the Neptune DB instance.

On this page: