Neptunedata: Start-NEPTLoaderJob Cmdlet | AWS Tools for PowerShell

Synopsis

Calls the Amazon NeptuneData StartLoaderJob API operation.

Syntax

Start-NEPTLoaderJob
-Dependency <String[]>
-EdgeOnlyLoad <Boolean>
-FailOnError <Boolean>
-Format <Format>
-IamRoleArn <String>
-Mode <Mode>
-Parallelism <Parallelism>
-ParserConfiguration <Hashtable>
-QueueRequest <Boolean>
-S3BucketRegion <S3BucketRegion>
-Source <String>
-UpdateSingleCardinalityProperty <Boolean>
-UserProvidedEdgeId <Boolean>
-Select <String>
-Force <SwitchParameter>
-ClientConfig <AmazonNeptunedataConfig>

Description

Starts a Neptune bulk loader job to load data from an Amazon S3 bucket into a Neptune DB instance. See Using the Amazon Neptune Bulk Loader to Ingest Data. When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:StartLoaderJob IAM action in that cluster.

Parameters

-ClientConfig <AmazonNeptunedataConfig>

Amazon.PowerShell.Cmdlets.NEPT.AmazonNeptunedataClientCmdlet.ClientConfig

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Dependency <String[]>

This is an optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.Neptune can queue up as many as 64 load requests at a time, if their queueRequest parameters are set to "TRUE". The dependencies parameter lets you make execution of such a queued request dependent on the successful completion of one or more specified previous requests in the queue.For example, if load Job-A and Job-B are independent of each other, but load Job-C needs Job-A and Job-B to be finished before it begins, proceed as follows:

Submit load-job-A and load-job-B one after another in any order, and save their load-ids.
Submit load-job-C with the load-ids of the two jobs in its dependencies field:

Because of the dependencies parameter, the bulk loader will not start Job-C until Job-A and Job-B have completed successfully. If either one of them fails, Job-C will not be executed, and its status will be set to LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED.You can set up multiple levels of dependency in this way, so that the failure of one job will cause all requests that are directly or indirectly dependent on it to be cancelled. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	Dependencies

-EdgeOnlyLoad <Boolean>

edgeOnlyLoad - A flag that controls file processing order during bulk loading.Allowed values: "TRUE", "FALSE".Default value: "FALSE".When this parameter is set to "FALSE", the loader automatically loads vertex files first, then edge files afterwards. It does this by first scanning all files to determine their contents (vertices or edges). When this parameter is set to "TRUE", the loader skips the initial scanning phase and immediately loads all files in the order they appear.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-FailOnError <Boolean>

failOnError - A flag to toggle a complete stop on an error.Allowed values: "TRUE", "FALSE".Default value: "TRUE".When this parameter is set to "FALSE", the loader tries to load all the data in the location specified, skipping any entries with errors.When this parameter is set to "TRUE", the loader stops as soon as it encounters an error. Data loaded up to that point persists.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Force <SwitchParameter>

This parameter overrides confirmation prompts to force the cmdlet to continue its operation. This parameter should always be used with caution.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Format <Format>

The format of the data. For more information about data formats for the Neptune Loader command, see Load Data Formats.Allowed values

csv for the Gremlin CSV data format.
opencypher for the openCypher CSV data format.
ntriples for the N-Triples RDF data format.
nquads for the N-Quads RDF data format.
rdfxml for the RDF\XML RDF data format.
turtle for the Turtle RDF data format.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-IamRoleArn <String>

The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. The IAM role ARN provided here should be attached to the DB cluster (see Adding the IAM Role to an Amazon Neptune Cluster.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Mode <Mode>

The load job mode.Allowed values: RESUME, NEW, AUTO.Default value: AUTO.
RESUME - In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.
NEW - In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.
AUTO - In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in RESUME mode.If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in NEW mode.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Parallelism <Parallelism>

The optional parallelism parameter can be set to reduce the number of threads used by the bulk load process.Allowed values:
LOW – The number of threads used is the number of available vCPUs divided by 8.
MEDIUM – The number of threads used is the number of available vCPUs divided by 2.
HIGH – The number of threads used is the same as the number of available vCPUs.
OVERSUBSCRIBE – The number of threads used is the number of available vCPUs multiplied by 2. If this value is used, the bulk loader takes up all available resources.This does not mean, however, that the OVERSUBSCRIBE setting results in 100% CPU utilization. Because the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.
Default value: HIGHThe parallelism setting can sometimes result in a deadlock between threads when loading openCypher data. When this happens, Neptune returns the LOAD_DATA_DEADLOCK error. You can generally fix the issue by setting parallelism to a lower setting and retrying the load command.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-ParserConfiguration <Hashtable>

parserConfiguration – An optional object with additional parser configuration values. Each of the child parameters is also optional:namedGraphUri - The default graph for all RDF formats when no graph is specified (for non-quads formats and NQUAD entries with no graph).The default is https://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph.
baseUri - The base URI for RDF/XML and Turtle formats.The default is https://aws.amazon.com/neptune/default.
allowEmptyStrings - Gremlin users need to be able to pass empty string values("") as node and edge properties when loading CSV data. If allowEmptyStrings is set to false (the default), such empty strings are treated as nulls and are not loaded.If allowEmptyStrings is set to true, the loader treats empty strings as valid property values and loads them accordingly.
Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-QueueRequest <Boolean>

This is an optional flag parameter that indicates whether the load request can be queued up or not. You don't have to wait for one load job to complete before issuing the next one, because Neptune can queue up as many as 64 jobs at a time, provided that their queueRequest parameters are all set to "TRUE". The queue order of the jobs will be first-in-first-out (FIFO).If the queueRequest parameter is omitted or set to "FALSE", the load request will fail if another load job is already running.Allowed values: "TRUE", "FALSE".Default value: "FALSE".

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-S3BucketRegion <S3BucketRegion>

The Amazon region of the S3 bucket. This must match the Amazon Region of the DB cluster.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Select <String>

Use the -Select parameter to control the cmdlet output. The default value is '*'. Specifying -Select '*' will result in the cmdlet returning the whole service response (Amazon.Neptunedata.Model.StartLoaderJobResponse). Specifying the name of a property of type Amazon.Neptunedata.Model.StartLoaderJobResponse will result in that property being returned. Specifying -Select '^ParameterName' will result in the cmdlet returning the selected cmdlet parameter value.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Source <String>

The source parameter accepts an S3 URI that identifies a single file, multiple files, a folder, or multiple folders. Neptune loads every data file in any folder that is specified.The URI can be in any of the following formats.
s3://(bucket_name)/(object-key-name)
https://s3.amazonaws.com/(bucket_name)/(object-key-name)
https://s3.us-east-1.amazonaws.com/(bucket_name)/(object-key-name)
The object-key-name element of the URI is equivalent to the prefix parameter in an S3 ListObjects API call. It identifies all the objects in the specified S3 bucket whose names begin with that prefix. That can be a single file or folder, or multiple files and/or folders.The specified folder or folders can contain multiple vertex files and multiple edge files.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-UpdateSingleCardinalityProperty <Boolean>

updateSingleCardinalityProperties is an optional parameter that controls how the bulk loader treats a new value for single-cardinality vertex or edge properties. This is not supported for loading openCypher data.Allowed values: "TRUE", "FALSE".Default value: "FALSE".By default, or when updateSingleCardinalityProperties is explicitly set to "FALSE", the loader treats a new value as an error, because it violates single cardinality.When updateSingleCardinalityProperties is set to "TRUE", on the other hand, the bulk loader replaces the existing value with the new one. If multiple edge or single-cardinality vertex property values are provided in the source file(s) being loaded, the final value at the end of the bulk load could be any one of those new values. The loader only guarantees that the existing value has been replaced by one of the new ones.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	UpdateSingleCardinalityProperties

-UserProvidedEdgeId <Boolean>

This parameter is required only when loading openCypher data that contains relationship IDs. It must be included and set to True when openCypher relationship IDs are explicitly provided in the load data (recommended).When userProvidedEdgeIds is absent or set to True, an :ID column must be present in every relationship file in the load.When userProvidedEdgeIds is present and set to False, relationship files in the load must not contain an :ID column. Instead, the Neptune loader automatically generates an ID for each relationship.It's useful to provide relationship IDs explicitly so that the loader can resume loading after error in the CSV data have been fixed, without having to reload any relationships that have already been loaded. If relationship IDs have not been explicitly assigned, the loader cannot resume a failed load if any relationship file has had to be corrected, and must instead reload all the relationships.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	UserProvidedEdgeIds

Common Credential and Region Parameters

-AccessKey <String>
The AWS access key for the user account. This can be a temporary access key if the corresponding session token is supplied to the -SessionToken parameter.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
Aliases AK
-Credential <AWSCredentials>
An AWSCredentials object instance containing access and secret key information, and optionally a token for session-based credentials.
Required? False
Position? Named
Accept pipeline input? True (ByValue, ByPropertyName)
-EndpointUrl <String>
The endpoint to make the call against.Note: This parameter is primarily for internal AWS use and is not required/should not be specified for normal usage. The cmdlets normally determine which endpoint to call based on the region specified to the -Region parameter or set as default in the shell (via Set-DefaultAWSRegion). Only specify this parameter if you must direct the call to a specific custom endpoint.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
-NetworkCredential <PSCredential>
Used with SAML-based authentication when ProfileName references a SAML role profile. Contains the network credentials to be supplied during authentication with the configured identity provider's endpoint. This parameter is not required if the user's default network identity can or should be used during authentication.
Required? False
Position? Named
Accept pipeline input? True (ByValue, ByPropertyName)
-ProfileLocation <String>
Used to specify the name and location of the ini-format credential file (shared with the AWS CLI and other AWS SDKs)If this optional parameter is omitted this cmdlet will search the encrypted credential file used by the AWS SDK for .NET and AWS Toolkit for Visual Studio first. If the profile is not found then the cmdlet will search in the ini-format credential file at the default location: (user's home directory)\.aws\credentials.If this parameter is specified then this cmdlet will only search the ini-format credential file at the location given.As the current folder can vary in a shell or during script execution it is advised that you use specify a fully qualified path instead of a relative path.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
Aliases AWSProfilesLocation, ProfilesLocation
-ProfileName <String>
The user-defined name of an AWS credentials or SAML-based role profile containing credential information. The profile is expected to be found in the secure credential file shared with the AWS SDK for .NET and AWS Toolkit for Visual Studio. You can also specify the name of a profile stored in the .ini-format credential file used with the AWS CLI and other AWS SDKs.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
Aliases StoredCredentials, AWSProfileName
-Region <Object>
The system name of an AWS region or an AWSRegion instance. This governs the endpoint that will be used when calling service operations. Note that the AWS resources referenced in a call are usually region-specific.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
Aliases RegionToCall
-SecretKey <String>
The AWS secret key for the user account. This can be a temporary secret key if the corresponding session token is supplied to the -SessionToken parameter.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
Aliases SK, SecretAccessKey
-SessionToken <String>
The session token if the access and secret keys are temporary session-based credentials.
Required? False
Position? Named
Accept pipeline input? True (ByPropertyName)
Aliases ST

Outputs

Amazon.Neptunedata.Model.StartLoaderJobResponse
This cmdlet returns an Amazon.Neptunedata.Model.StartLoaderJobResponse object containing multiple properties.

Start-NEPTLoaderJob Cmdlet

Amazon NeptuneData
Available in AWS.Tools.Neptunedata, AWSPowerShell.NetCore and AWSPowerShell

Synopsis

Syntax

Description

Parameters

Common Credential and Region Parameters

Outputs

Supported Version

Start-NEPTLoaderJob Cmdlet

Amazon NeptuneDataAvailable in AWS.Tools.Neptunedata, AWSPowerShell.NetCore and AWSPowerShell

Synopsis

Syntax

Description

Parameters

Common Credential and Region Parameters

Outputs

Related Links

Supported Version

Amazon NeptuneData
Available in AWS.Tools.Neptunedata, AWSPowerShell.NetCore and AWSPowerShell