Neptune ML predicates used in Gremlin inference queries - Amazon Neptune

Neptune ML predicates used in Gremlin inference queries

Neptune#ml.deterministic

This predicate is an option for inductive inference queries — that is, for queries that include the Neptune#ml.inductiveInference predicate.

When using inductive inference, the Neptune engine creates the appropriate subgraph to evaluate the trained GNN model, and the requirements of this subgraph depend on parameters of the final model. Specifically, the num-layer parameter determines the number of traversal hops from the target nodes or edges, and the fanouts parameter specifies how many neighbors to sample at each hop (see HPO parameters).

By default, inductive inference queries run in non-deterministic mode, in which Neptune builds the neighborhood randomly. When making predictions, this normal random-neighbor sampling sometimes result in different predictions.

When you include Neptune#ml.deterministic in an inductive inference query, the Neptune engine attempts to sample neighbors in a deterministic way so that multiple invocations of the same query return the same results every time. The results can't be guaranteed to be completely deterministic, however, because changes to the underlying graph and artifacts of distributed systems can still introduce fluctuations.

You include the Neptune#ml.deterministic predicate in a query like this:

.with("Neptune#ml.deterministic")

If the Neptune#ml.deterministic predicate is included in a query that doesn't also include Neptune#ml.inductiveInference, it is simply ignored.

Neptune#ml.disableInductiveInferenceMetadataCache

This predicate is an option for inductive inference queries — that is, for queries that include the Neptune#ml.inductiveInference predicate.

For inductive inference queries, Neptune uses a metadata file stored in Amazon S3 to decide the number of hops and the fanout while building the neighborhood. Neptune normally caches this model metadata to avoid fetching the file from Amazon S3 repeatedly. Caching can be disabled by including the Neptune#ml.disableInductiveInferenceMetadataCache predicate in the query. Although it may be slower for Neptune to fetch the metadata directly from Amazon S3, it is useful when the SageMaker endpoint has been updated after retraining or transformation and the cache is stale.

You include the Neptune#ml.disableInductiveInferenceMetadataCache predicate in a query like this:

.with("Neptune#ml.disableInductiveInferenceMetadataCache")

Here is how a sample query might look in a Jupyter notebook:

%%gremlin g.with("Neptune#ml.endpoint", "ep1") .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole") .with("Neptune#ml.disableInductiveInferenceMetadataCache") .V('101').properties("rating") .with("Neptune#ml.regression") .with("Neptune#ml.inductiveInference")

Neptune#ml.endpoint

The Neptune#ml.endpoint predicate is used in a with() step to specify the inference endpoint, if necessary:

.with("Neptune#ml.endpoint", "the model's SageMaker inference endpoint")

You can identify the endpoint either by its id or its URL. For example:

.with( "Neptune#ml.endpoint", "node-classification-movie-lens-endpoint" )

Or:

.with( "Neptune#ml.endpoint", "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/node-classification-movie-lens-endpoint/invocations" )
Note

If you set the neptune_ml_endpoint parameter in your Neptune DB cluster parameter group to the endpoint id or URL, you don't need to include the Neptune#ml.endpoint predicate in each query.

Neptune#ml.iamRoleArn

Neptune#ml.iamRoleArn is used in a with() step to specify the ARN of the SageMaker execution IAM role, if necessary:

.with("Neptune#ml.iamRoleArn", "the ARN for the SageMaker execution IAM role")

For information about how to create the SageMaker execution IAM role, see Create a custom NeptuneSageMakerIAMRole role.

Note

If you set the neptune_ml_iam_role parameter in your Neptune DB cluster parameter group to the ARN of your SageMaker execution IAM role, you don't need to include the Neptune#ml.iamRoleArn predicate in each query.

Neptune#ml.inductiveInference

Transductive inference is enabled by default in Gremlin. To make a real-time inductive inference query, include the Neptune#ml.inductiveInference predicate like this:

.with("Neptune#ml.inductiveInference")

If your graph is dynamic, inductive inference is often the best choice, but if your graph is static, transductive inference is faster and more efficient.

Neptune#ml.limit

The Neptune#ml.limit predicate optionally limits the number of results returned per entity:

.with( "Neptune#ml.limit", 2 )

By default, the limit is 1, and the maximum number that can be set is 100.

Neptune#ml.threshold

The Neptune#ml.threshold predicate optionally establishes a cutoff threshold for result scores:

.with( "Neptune#ml.threshold", 0.5D )

This lets you discard all results with scores below the specified threshold.

Neptune#ml.classification

The Neptune#ml.classification predicate is attached to the properties() step to establish that the properties need to be fetched from the SageMaker endpoint of the node classification model:

.properties( "property key of the node classification model" ).with( "Neptune#ml.classification" )

Neptune#ml.regression

The Neptune#ml.regression predicate is attached to the properties() step to establish that the properties need to be fetched from the SageMaker endpoint of the node regression model:

.properties( "property key of the node regression model" ).with( "Neptune#ml.regression" )

Neptune#ml.prediction

The Neptune#ml.prediction predicate is attached to in() and out() steps to establish that this a link-prediction query:

.in("edge label of the link prediction model").with("Neptune#ml.prediction").hasLabel("target node label")

Neptune#ml.score

The Neptune#ml.score predicate is used in Gremlin node or edge classification queries to fetch a machine-learning confidence Score. The Neptune#ml.score predicate should be passed together with the query predicate in the properties() step to obtain an ML confidence score for node or edge classification queries.

You can find a node classification example with other node classification examples, and an edge classification example in the edge classification section.