Using Neptune workbench magics in your notebooks - Amazon Neptune

Using Neptune workbench magics in your notebooks

The Neptune workbench provides a number of so-called magic commands in the notebooks that save a great deal of time and effort. They fall into two categories: line magics and cell magics.

Line magics are commands preceded by a single percent sign (%). They only take line input, not input from the rest of the cell body. Neptune workbench provides the following line magics:

Cell magics are preceded by two percent signs (%%) rather than one, and use the cell content as input, although they can also take line content as input. Neptune workbench provides the following cell magics:

There are also two magics, a line magic and a cell magic, for working with Neptune machine learning:

Note

When working with Neptune magics, you can generally get help text using a --help or -h parameter. With a cell magic, the body cannot be empty, so when getting help, put filler text, even a single character, in the body. For example:

%%gremlin --help x

The %seed line magic

The %seed line magic is a convenient way to add data to your Neptune endpoint that you can use to explore and experiment with Gremlin, openCypher, or SPARQL queries. It provides a form where you can select the data model you want to explore (property-graph or RDF) and then choose from among a number of different sample data sets that Neptune provides.

The %load line magic

The %load line magic generates a form that you can use to submit a bulk load request to Neptune (see Neptune Loader Command). The source must be an Amazon S3 path in the same region as the Neptune cluster.

The %load_ids line magic

The %load_ids line magic retrieves the load Ids that have been submitted to the notebook's host endpoint (see Neptune Loader Get-Status request parameters). The request takes this form:

GET https://your-neptune-endpoint:port/loader

The %load_status line magic

The %load_status line magic retrieves the load status of a particular load job that has been submitted to the notebook's host endpoint, specified by the line input (see Neptune Loader Get-Status request parameters). The request takes this form:

GET https://your-neptune-endpoint:port/loader?loadId=loadId

The line magic looks like this:

%load_status load id

The %cancel_load line magic

The %cancel_load line magic cancels a particular load job (see Neptune Loader Cancel Job). The request takes this form:

DELETE https://your-neptune-endpoint:port/loader?loadId=loadId

The line magic looks like this:

%cancel_load load id

The %status line magic

Retrieves status information from the notebook's host endpoint (%graph_notebook_config shows the host endpoint).

The %gremlin-status line magic

Retrieves Gremlin query status information.

The %opencypher-status line magic (also %oc_status)

Retrieves query status for an opencypher query. This line magic takes the following optional arguments:

  • --queryId or -q   –   Specifies the ID of a specific running query for which to show the status.

  • --cancel_query or -c   –   Cancels a running query. Does not take a value.

  • --silent or -s   –   If --silent is set to true when cancelling a query, the running query is cancelled with an HTTP response code of 200. Otherwise, the HTTP response code would be 500.

  • --store-to   –   Specifies the name of a variable in which to store the query results.

The %sparql-status line magic

Retrieves SPARQL query status information.

The %graph_notebook_config line magic

This line magic displays a JSON object containing the configuration that the notebook is using to communicate with Neptune. The configuration includes:

  • host: The endpoint to which to connect and issue commands.

  • port: The port used when issuing commands to Neptune. The default is 8182.

  • auth_mode: The mode of authentication to use when issuing commands to Neptune. Must be either DEFAULT or IAM.

  • load_from_s3_arn: Specifies an Amazon S3 ARN for the %load magic to use. If this value is empty, the ARN must be specified in the %load command.

  • ssl: A Boolean value indicating whether or not to connect to Neptune using TLS. The default value is true.

  • aws_region: The region where this notebook is deployed. This information is used for IAM authentication and for %load requests.

You can change the configuration by copying the %graph_notebook_config output into a new cell and make changes to it there. Then if you run the %%graph_notebook_config cell magic on the new cell, the configuration will be changed accordingly.

The %graph_notebook_host line magic

Sets the line input as the notebook's host.

The %graph_notebook_version line magic

The %graph_notebook_version line magic returns the Neptune workbench notebook release number. For example, graph visualization was introduced in version 1.27.

The %graph_notebook_vis_options line magic

The %graph_notebook_vis_options line magic displays the current visualization settings that the notebook is using. These options are explained in the vis.js documentation.

You can modify these settings by copying the output into a new cell, making the changes you want, and then running the %%graph_notebook_vis_options cell magic on the cell.

To restore the visualization settings to their default values, you can run the %graph_notebook_vis_options line magic with a reset parameter. This resets all the visualization settings:

%graph_notebook_vis_options reset

The %%graph_notebook_config cell magic

The %%graph_notebook_config cell magic uses a JSON object containing configuration information to modify the settings that the notebook is using to communicate with Neptune, if possible. The configuration takes the same form returned by the %graph_notebook_config line magic.

For example:

%%graph_notebook_config { "host": "my-new-cluster-endpoint.amazon.com", "port": 8182, "auth_mode": "DEFAULT", "load_from_s3_arn": "", "ssl": true, "aws_region": "us-east-1" }

The %%sparql cell magic

The %%sparql cell magic issues a SPARQL query to the Neptune endpoint. It accepts the following optional line input:

  • -h or --help   –   Returns help text about these parameters.

  • --path   –   Prefixes a path to the SPARQL endpoint. For example, if you specify --path "abc/def" then the endpoint called would be host:port/abc/def.

  • --expand-all   –   This is a query visualization hint that tells the visualizer to include all ?s ?p ?o results in the graph diagram regardless of binding type.

    By default, a SPARQL visualization only includes triple patterns where the o? is a uri or a bnode (blank node). All other ?o binding types such as literal strings or integers are treated as properties of the ?s node that can be viewed using the Details pane in the Graph tab.

    Use the --expand-all query hint when you may want to include such literal values as vertices in the visualization instead.

    Don't combine this visualization hint with explain parameters, because explain queries are not visualized.

  • --explain-type   –   Used to specify the explain mode to use (one of: dynamic, static, or details).

  • --explain-format   –   Used to specify the response format for an explain query (one of text/csv or text/html).

  • --store-to   –   Used to specify a variable to which to store the query results.

Example of an explain query:

%%sparql explain SELECT * WHERE {?s ?p ?o} LIMIT 10

Example of a visualization query with an --expand-all visualization hint parameter (see SPARQL visualization):

%%sparql --expand-all SELECT * WHERE {?s ?p ?o} LIMIT 10

The %%gremlin cell magic

The %%gremlin cell magic issues a Gremlin query to the Neptune endpoint using WebSocket. It accepts an optional line input to toggle into Gremlin explain /> mode or Gremlin profile API, and a separate optional visualization hint input to modify visualization output behavior (see Gremlin visualization).

Example of an explain query:

%%gremlin explain g.V().limit(10)

Example of a profile query:

%%gremlin profile g.V().limit(10)

Example of a visualization query with a visualization query hint:

%%gremlin -p v,outv g.V().out().limit(10)

The %%opencypher cell magic (also %%oc)

The %%opencypher cell magic (which also has the abbreviated %%oc form), issues an openCypher query to the Neptune endpoint. It accepts the following optional line input arguments:

  • mode   –   The query mode: either query or bolt. The default value if you don't supply this argument is query.

  • --group-by or -g   –   Specifies the property used to group nodes. For example, code, ~id. The default value if you don't supply this argument is ~labels.

  • --ignore-groups   –   If present, all grouping options are ignored.

  • --display-propery or -d   –   Specifies the property whose value should be displayed for each vertex. The default value if you don't supply this argument is ~labels.

  • --edge-display-propery or -de   –   Specifies the property whose value should be displayed for each edge. The default value if you don't supply this argument is ~labels.

  • --label-max-length or -l   –   Specifies the maximum number of characters of a vertex label to display. The default value if you don't supply this argument is 10.

  • --store-to or -s   –   Specifies the name of a variable in which to store the query results.

The %%graph_notebook_vis_options cell magic

The %%graph_notebook_vis_options cell magic lets you set visualization options for the notebook. You can copy the settings returned by the %graph-notebook-vis-options line magic into a new cell, make changes to them, and use the %%graph_notebook_vis_options cell magic to set the new values.

These options are explained in the vis.js documentation.

To restore the visualization settings to their default values, you can run the %graph_notebook_vis_options line magic with a reset parameter. This resets all the visualization settings:

%graph_notebook_vis_options reset

The %neptune_ml line magic

You can use the %neptune_ml line magic to initiate and manage various Neptune ML operations.

Note

You can also initiate and manage some Neptune ML operations using the %%neptune_ml cell magic.

  • %neptune_ml export start   –   Starts a new export job.

    Parameters

    • --export-url exporter-endpoint   –   (optional) The Amazon API Gateway endpoint where the exporter can be called.

    • --export-iam   –   (optional) Flag indicating that requests to the export url must be signed using SigV4.

    • --export-no-ssl   –   (optional) Flag indicating that SSL should not be used when connecting to the exporter.

    • --wait   –   (optional) Flag indicating that the operation should wait until the export has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between export status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the export job to complete before returning the most recent status (Default: 3,600).

    • --store-to location-to-store-result   –   (optional) The variable in which to store the export result. If --wait is specified, the final status will be stored there.

  • %neptune_ml export status   –   Retrieves the status of an export job.

    Parameters

    • --job-id export job ID   –   The ID of the export job for which to retrieve status.

    • --export-url exporter-endpoint   –   (optional) The Amazon API Gateway endpoint where the exporter can be called.

    • --export-iam   –   (optional) Flag indicating that requests to the export url must be signed using SigV4.

    • --export-no-ssl   –   (optional) Flag indicating that SSL should not be used when connecting to the exporter.

    • --wait   –   (optional) Flag indicating that the operation should wait until the export has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between export status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the export job to complete before returning the most recent status (Default: 3,600).

    • --store-to location-to-store-result   –   (optional) The variable in which to store the export result. If --wait is specified, the final status will be stored there.

  • %neptune_ml dataprocessing start   –   Starts the Neptune ML dataprocessing step.

    Parameters

    • --job-id ID for this job   –   (optional) ID to assign to this job.

    • --s3-input-uri S3 URI   –   (optional) The S3 URI at which to find the input for this dataprocessing job.

    • --config-file-name file name   –   (optional) Name of the configuration file for this dataprocessing job.

    • --store-to location-to-store-result   –   (optional) The variable in which to store the dataprocessing result.

    • --wait   –   (optional) Flag indicating that the operation should wait until the dataprocessing has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between dataprocessing status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the dataprocessing job to complete before returning the most recent status (Default: 3,600).

  • %neptune_ml dataprocessing status   –   Retrieves the status of a dataprocessing job.

    Parameters

    • --job-id ID of the job   –   ID of the job for which to retrieve the status.

    • --store-to instance type   –   (optional) The variable in which to store the model-training result.

    • --wait   –   (optional) Flag indicating that the operation should wait until the model-training has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between model-training status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the dataprocessing job to complete before returning the most recent status (Default: 3,600).

  • %neptune_ml training start   –   Starts the Neptune ML model-training process.

    Parameters

    • --job-id ID for this job   –   (optional) ID to assign to this job.

    • --data-processing-id dataprocessing job ID   –   (optional) ID of the dataprocessing job that created the artifacts to use for training.

    • --s3-output-uri S3 URI   –   (optional) The S3 URI at which to store the output from this model-training job.

    • --instance-type S3 URI   –   (optional) The instance size to use for this model-training job.

    • --store-to location-to-store-result   –   (optional) The variable in which to store the model-training result.

    • --wait   –   (optional) Flag indicating that the operation should wait until the model-training has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between model-training status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the model-training job to complete before returning the most recent status (Default: 3,600).

  • %neptune_ml training status   –   Retrieves the status of a Neptune ML model-training job.

    Parameters

    • --job-id ID of the job   –   ID of the job for which to retrieve the status.

    • --store-to instance type   –   (optional) The variable in which to store the status result.

    • --wait   –   (optional) Flag indicating that the operation should wait until the model-training has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between model-training status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the dataprocessing job to complete before returning the most recent status (Default: 3,600).

  • %neptune_ml endpoint create   –   Creates a query endpoint for a Neptune ML model.

    Parameters

    • --job-id ID for this job   –   (optional) ID to assign to this job.

    • --model-job-id model-training job ID   –   (optional) ID of the model-training job for which to create a query endpoint.

    • --instance-type S3 URI   –   (optional) The instance size to use for the query endpoint.

    • --store-to location-to-store-result   –   (optional) The variable in which to store the result of the endpoint creation.

    • --wait   –   (optional) Flag indicating that the operation should wait until the endpoint creation has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the endpoint creation job to complete before returning the most recent status (Default: 3,600).

  • %neptune_ml endpoint status   –   Retrieves the status of a Neptune ML query endpoint.

    Parameters

    • --job-id endpoint creation ID   –   (optional) ID of an endpoint creation job for which to report status.

    • --store-to location-to-store-result   –   (optional) The variable in which to store the status result.

    • --wait   –   (optional) Flag indicating that the operation should wait until the endpoint creation has completed.

    • --wait-interval interval-to-wait   –   (optional) Sets the time, in seconds, between status checks (Default: 60).

    • --wait-timeout timeout-seconds   –   (optional) Sets the time, in seconds, to wait for the endpoint creation job to complete before returning the most recent status (Default: 3,600).

The %%neptune_ml cell magic

The %%neptune_ml cell magic ignores line inputs such as --job-id or --export-url. Instead, it lets you provide those inputs and others within within the cell body.

You can also save such inputs in another cell, assigned to a Jupyter variable, and then inject them into the cell body using that variable. That way, you can use such inputs over and over without having to re-enter them all every time.

This only works if the injecting variable is the only content of the cell. You cannot use multiple variables in one cell, or a combination of text and a variable.

For example, the %%neptune_ml export start cell magic can consume a JSON document in the cell body that contains all the parameters described in Top-level parameters for the Neptune ML export process.

In the Neptune-ML-01-Introduction-to-Node-Classification-Gremlin notebook, under Configuring Features in the Export the data and model configuration section, you can see how the following cell holds export parameters in a document assigned to a Jupyter variable named export-params:

export_params = { "command": "export-pg", "params": { "endpoint": neptune_ml.get_host(), "profile": "neptune_ml", "useIamAuth": neptune_ml.get_iam(), "cloneCluster": False }, "outputS3Path": f'{s3_bucket_uri}/neptune-export', "additionalParams": { "neptune_ml": { "targets": [ { "node": "movie", "property": "genre" } ], "features": [ { "node": "movie", "property": "title", "type": "word2vec" }, { "node": "user", "property": "age", "type": "bucket_numerical", "range" : [1, 100], "num_buckets": 10 } ] } }, "jobSize": "medium"}

When you run this cell, Jupyter saves the parameters document under that name. Then, you can use ${export_params} to inject the JSON document into the body of a %%neptune_ml export start cell, like this:

%%neptune_ml export start --export-url {neptune_ml.get_export_service_host()} --export-iam --wait --store-to export_results ${export_params}

Available forms of the %%neptune_ml cell magic

The %%neptune_ml cell magic can be used in the following forms:

  • %%neptune_ml export start   –   Starts a Neptune ML export process.

  • %%neptune_ml dataprocessing start   –   Starts a Neptune ML dataprocessing job.

  • %%neptune_ml training start   –   Starts a Neptune ML model-training job.

  • %%neptune_ml endpoint create   –   Creates a Neptune ML query endpoint for a model.