Top-level parameters for the Neptune ML export process - Amazon Neptune

Top-level parameters for the Neptune ML export process

Whether you are using the Neptune-Export service or the neptune-export command line utility, the parameters you use to control the export are mostly the same. They contain a JSON object passed to the Neptune-Export endpoint or to neptune-export on the command line.

The object passed in to the export process has up to four top-level fields:

-d '{ "outputS3Path" : "s3:/(your Amazon S3 bucket)/(path to the folder for exported data)", "jobsize" : "(for Neptune-Export service only)", "params" : { (a JSON object that contains export-process parameters) }, "additionalParams": { (a JSON object that contains parameters for training configuration) } }'

Top-level fields in the JSON object

The outputS3Path parameter

The outputS3Path top-level parameter is required, and must contain the URI of an Amazon S3 location to which the exported files can be published:

"outputS3Path" : "s3://(your Amazon S3 bucket)/(path to the folder for exported data)"

The value must begin with s3://, followed by a valid bucket name and optionally a folder path within the bucket.

The jobSize parameter

The jobSize top-level parameter is only used with the the Neptune-Export service, not with the neptune-export command line utility, and is optional. It lets you characterize the size of the export job you are starting, which helps determine the amount of compute resources devoted to the job and its maximum concurrency level.

"jobsize" : "(one of four size descriptors)"

The four valid size descriptors are:

  • small   –   Maximum concurrency: 8. Suitable for storage volumes up to 10 GB.

  • medium   –   Maximum concurrency: 32. Suitable for storage volumes up to 100 GB.

  • large   –   Maximum concurrency: 64. Suitable for storage volumes over 100 GB but less than 1 TB.

  • xlarge   –   Maximum concurrency: 96. Suitable for storage volumes over 1 TB.

By default, an export initiated on the Neptune-Export service runs as a small job.

The performance of an export depends not only on the jobSize setting, but also on the number of Neptune instances that you're exporting from, the size of each instance, and the effective concurrency level of the job.

You can configure the number of database instances using the cloneClusterReplicaCount parameter, and you can configure the job's effective concurrency level using the concurrency parameter.

The params object

The params top-level parameter is a JSON object that contains parameters that you use to control the export process itself, as explained in Export parameter fields in the params top-level JSON object.

The additionalParams object

The additionalParams top-level parameter is a JSON object that contains parameters you can use to control the training data configuration that the export process creates. See Using the additionalParams object to tune the Neptune ML export of model-training data.