The targets field in a neptune_ml object - Amazon Neptune

The targets field in a neptune_ml object

The targets field in a JSON training data export configuration contains an array of target objects that specify a training task and and the machine-learning class labels for training this task. The contents of the target objects varies depending on whether you are training on property-graph data or RDF data.

For property-graph node classification and regression tasks, target objects in the array can look like this:

{ "node": "(node property-graph label)", "property": "(property name)", "type" : "(used to specify classification or regression)", "split_rate": [0.8,0.2,0.0], "separator": "," }

For property-graph edge classification, regression or link prediction tasks, they can look like this:

{ "edge": "(edge property-graph label)", "property": "(property name)", "type" : "(used to specify classification, regression or link_prediction)", "split_rate": [0.8,0.2,0.0], "separator": "," }

For RDF classification and regression tasks, target objects in the array can look like this:

{ "node": "(node type of an RDF node)", "predicate": "(predicate IRI)", "type" : "(used to specify classification or regression)", "split_rate": [0.8,0.2,0.0] }

For RDF link prediction tasks, target objects in the array can look like this::

{ "subject": "(source node type of an edge)", "predicate": "(relation type of an edge)", "object": "(destination node type of an edge)", "type" : "link_prediction", "split_rate": [0.8,0.2,0.0] }

Target objects can contain the following fields:

Fields in a property-graph target object

The node (vertex) field in a target object

The property-graph label of a target node (vertex). A target object must contain a node element or an edge element, but not both.

A node can take either a single value, like this:

"node": "Movie"

Or, in the case of a multi-label vertex, it can take an array of values, like this:

"node": ["Content", "Movie"]

The edge field in a property-graph target object

Specifies a target edge by its start node label(s), its own label, and its end-node label(s). A target object must contain an edge element or a node element, but not both.

The value of an edge field is a JSON array of three strings that represent the start-node's property-graph label(s), the property-graph label of the edge itself, and the end-node's property-graph label(s), like this:

"edge": ["Person_A", "knows", "Person_B"]

If the start node and/or end node has multiple labels, enclose them in an array, like this:

"edge": [ ["Admin", Person_A"], "knows", ["Admin", "Person_B"] ]

The property field in a property-graph target object

Specifies a property of the target vertex or edge, like this:

"property" : "rating"

This field is required, except when the target task is link prediction.

The type field in a property-graph target object

Indicates the type of target task to be performed on the node or edge, like this:

"type" : "regression"

The supported task types for nodes are:

  • classification

  • regression

The supported task types for edges are:

  • classification

  • regression

  • link_prediction

This field is required.

The split_rate field in a property-graph target object

(Optional) An estimate of the proportions of nodes or edges that the training, validation, and test stages will use, respectively. These proportions are represented by a JSON array of three numbers between zero and one that add up to one:

"split_rate": [0.7, 0.1, 0.2]

If you do not supply the optional split_rate field, the default estimated value is [0.9, 0.1, 0.0] for classification and regression tasks, and [0.9,0.05, 0.05] for link prediction tasks.

The separator field in a property-graph target object

(Optional) Used with a classification task.

The separator field specifies a character used to split a target property value into multiple categorical values when it is used to store multiple category values in a string. For example:

"separator": "|"

The presence of a separator field indicates that the task is a multi-target classification task.

Fields in an RDF target object

The node field in an RDF target object

Defines the node type of target nodes. Used with node classification tasks or node regression tasks. The node type of a node in RDF is defined by:

node_id, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, node_type

An RDF node can only take a single value, like this:

"node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie"

The subject field in an RDF target object

For link prediction tasks, subject defines the source node type of target edges.

"subject": "http://aws.amazon.com/neptune/csv2rdf/class/Director"
Note

For link prediction tasks, subject should be used together with predicate and object. If any of these three is not provided, all edges are treated as the training target.

The predicate field in an RDF target object

For node classification and node regression tasks, predicate defines what literal data is used as the target node feature of a target node.

"predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/genre"
Note

If the target nodes have only one predicate defining the target node feature, the predicate field can be omitted.

For link prediction tasks, predicate defines the relation type of target edges:

"predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/direct"
Note

For link prediction tasks, predicate should be used together with subject and object. If any of these three is not provided, all edges are treated as the training target.

The object field in an RDF target object

For link prediction tasks, object defines the destination node type of target edges:

"object": "http://aws.amazon.com/neptune/csv2rdf/class/Movie"
Note

For link prediction tasks, object should be used together with subject and predicate. If any of these three is not provided, all edges are treated as the training target.

The type field in an RDF target object

Indicates the type of target task to be performed, like this:

"type" : "regression"

The supported task types for RDF data are:

  • link_prediction

  • classification

  • regression

This field is required.

The split_rate field in a property-graph target object

(Optional) An estimate of the proportions of nodes or edges that the training, validation, and test stages will use, respectively. These proportions are represented by a JSON array of three numbers between zero and one that add up to one:

"split_rate": [0.7, 0.1, 0.2]

If you do not supply the optional split_rate field, the default estimated value is [0.9, 0.1, 0.0].