Using the neptune-export tool or Neptune-Export service to export data from Neptune for Neptune ML - Amazon Neptune

Using the neptune-export tool or Neptune-Export service to export data from Neptune for Neptune ML

Neptune ML requires that you provide training data for the Deep Graph Library (DGL) to create and test models. To do this, you can export data from Neptune using an open-source tool named neptune-export. You can use the tool either as a service (the Neptune-Export service) or as the Java neptune-export command line tool. When you use the Neptune-Export service, you trigger and monitor export jobs through a REST API. When you run neptune-export as a command line tool, you do so in an environment where your Neptune DB cluster is accessible.

Both the Neptune-Export service and the neptune-export command line tool publish data to Amazon Simple Storage Service (Amazon S3) in a CSV format, encrypted using Amazon S3 server-side encryption (SSE-S3). The export job also creates and publishes an encrypted model-training configuration file along with the exported data.

When you trigger an export job, you can supply hints that specify labels and features you wish to include in the training configuration file. You can also modify that file once it has been created and published to Amazon S3.

If you try to export data from a Neptune DB cluster whose data is changing while the export is happening, the consistency of the exported data is not guaranteed. That is, if your cluster is servicing write traffic while an export job is in progress, there may be inconsistencies in the exported data. This is true whether you export from the primary instance in the cluster or from one or more read replicas.

To guarantee that exported data is consistent, it is best to export from a clone of your DB cluster. This both provides the export tool with a static version of your data and ensures that the export job doesn't slow down queries in your original DB cluster.

To make this easier, you can indicate that you want to clone the source DB cluster when you trigger an export job. If you do, the export process automatically creates the clone, uses it for the export, and then deletes it when the export is finished.