Getting a quick summary report about your graph
The Neptune graph summary API retrieves the following information about your graph:
For property (PG) graphs, the graph summary API returns a read-only list of node and edge labels and property keys, along with counts of nodes, edges, and properties.
For resource description framework (RDF) graphs, the graph summary API returns a read-only list of classes and predicate keys, along with counts of quads, subjects, and predicates.
Note
The graph summary API was introduced in Neptune engine release 1.2.1.0.
With the graph summary API, you can quickly gain a high-level understanding of your graph data size and content. You can also use the API interactively within a Neptune notebook using the %summary Neptune Workbench magic. In a graph application, the API can be used to improve search results by providing discovered node or edge labels as part of the search.
Graph summary data is drawn from the DFE statistics computed by the Neptune DFE engine during runtime, and is available whenever DFE statistics are available. Statistics are enabled by default when you create a new Neptune DB cluster.
Note
Statistics generation is disabled on t3
and t4
instance types (that is, on db.t3.medium
and db.t4g.medium
instance types) to conserve memory. As a result, graph summary data is not
available either on those instance types.
You can check the status of DFE statistics using the statistics status API. As long as auto-generation of statistics has not been disabled, statistics are automatically updated periodically.
If you want to be sure that statistics are as up to date as possible when you request a graph summary, you can manually trigger a statistics update right before retrieving the summary. If the graph is changing while the statistics are being computed, they will necessarily lag slightly behind, but not by much.
Using the graph summary API to retrieve graph summary information
For a property graph that you query using Gremlin or openCypher, you can retrieve a graph summary from the property-graph summary endpoint. There is both a long and a short URI for this endpoint:
https://
your-neptune-host
:port
/propertygraph/statistics/summaryhttps://
your-neptune-host
:port
/pg/statistics/summary
For an RDF graph that you query using SPARQL, you can retrieve a graph summary from the RDF summary endpoint:
https://
your-neptune-host
:port
/rdf/statistics/summary
These endpoints are read-only, and only support an HTTP GET
operation.
If $GRAPH_SUMMARY_ENDPOINT is set to the address of whichever endpoint you want
to query, you can retrieve the summary data using curl
and HTTP GET
as follows:
curl -G "$GRAPH_SUMMARY_ENDPOINT"
If no statistics are available when you try to retrieve a graph summary, the response looks like this:
{ "detailedMessage": "Statistics are not available. Summary can only be generated after statistics are available.", "requestId": "48c1f788-f80b-b69c-d728-3f6df579a5f6", "code": "StatisticsNotAvailableException" }
The mode
URL query parameter for the graph summary API
The graph summary API accepts a URL query parameter named mode
,
which can take one of two values, namely basic
(the default) and
detailed
. For an RDF graph, the detailed
mode graph summary
response contains an additional subjectStructures
field. For a property
graph, the detailed graph summary response contains two additional fields, namely
nodeStructures
and edgeStructures
.
To request a detailed
graph summary response, include the mode
parameter as follows:
curl -G "$GRAPH_SUMMARY_ENDPOINT?mode=detailed"
If the mode
parameter isn't present, basic
mode is used
by default, so while it is possible to specify ?mode=basic
explicitly,
this is not necessary.
Graph summary response for a property graph (PG)
For an empty property graph, the detailed graph summary response looks like this:
{ "status" : "200 OK", "payload" : { "version" : "v1", "lastStatisticsComputationTime" : "2023-01-10T07:58:47.972Z", "graphSummary" : { "numNodes" : 0, "numEdges" : 0, "numNodeLabels" : 0, "numEdgeLabels" : 0, "nodeLabels" : [ ], "edgeLabels" : [ ], "numNodeProperties" : 0, "numEdgeProperties" : 0, "nodeProperties" : [ ], "edgeProperties" : [ ], "totalNodePropertyValues" : 0, "totalEdgePropertyValues" : 0, "nodeStructures" : [ ], "edgeStructures" : [ ] } } }
A property graph (PG) summary response has the following fields:
-
status
– the HTTP return code of the request. If the request succeeded, the code is 200.See Common graph summary errors for a list of common errors.
-
payload
version
– The version of this graph summary response.lastStatisticsComputationTime
– The timestamp, in ISO 8601 format, of the time at which Neptune last computed statistics.-
graphSummary
numNodes
– The number of nodes in the graph.numEdges
– The number of edges in the graph.numNodeLabels
– The number of distinct node labels in the graph.numEdgeLabels
– The number of distinct edge labels in the graph.nodeLabels
– List of distinct node labels in the graph.edgeLabels
– List of distinct edge labels in the graph.numNodeProperties
– The number of distinct node properties in the graph.numEdgeProperties
– The number of distinct edge properties in the graph.nodeProperties
– List of distinct node properties in the graph, along with the count of nodes where each property is used.edgeProperties
– List of distinct edge properties in the graph along with the count of edges where each property is used.totalNodePropertyValues
– Total number of usages of all node properties.totalEdgePropertyValues
– Total number of usages of all edge properties.-
nodeStructures
– This field is only present whenmode=detailed
is specified in the request. It contains a list of node structures, each of which contains the following fields:count
– Number of nodes that have this specific structure.nodeProperties
– List of node properties present in this specific structure.distinctOutgoingEdgeLabels
– List of distinct outgoing edge labels present in this specific structure.
-
edgeStructures
– This field is only present whenmode=detailed
is specified in the request. It contains a list of edge structures, each of which contains the following fields:count
– Number of edges that have this specific structure.edgeProperties
– List of edge properties present in this specific structure.
Graph summary response for an RDF graph
For an empty RDF graph, the detailed graph summary response looks like this:
{ "status" : "200 OK", "payload" : { "version" : "v1", "lastStatisticsComputationTime" : "2023-01-10T07:58:47.972Z", "graphSummary" : { "numDistinctSubjects" : 0, "numDistinctPredicates" : 0, "numQuads" : 0, "numClasses" : 0, "classes" : [ ], "predicates" : [ ], "subjectStructures" : [ ] } } }
An RDF graph summary response has the following fields:
-
status
– the HTTP return code of the request. If the request succeeded, the code is 200.See Common graph summary errors for a list of common errors.
-
payload
version
– The version of this graph summary response.lastStatisticsComputationTime
– The timestamp, in ISO 8601 format, of the time at which Neptune last computed statistics.-
graphSummary
numDistinctSubjects
– The number of distinct subjects in the graph.numDistinctPredicates
– The number of distinct predicates in the graph.numQuads
– The number of quads in the graph.numClasses
– The number of classes in the graph.classes
– List of classes in the graph.predicates
– List of predicates in the graph, along with the predicate counts.-
subjectStructures
– This field is only present whenmode=detailed
is specified in the request. It contains a list of subject structures, each of which contains the following fields:count
– Number of occurrences of this specific structure.predicates
– List of predicates present in this specific structure.
Sample property-graph (PG) summary response
Here is the detailed summary response for a property graph that contains the sample
property-graph air routes dataset
{ "status" : "200 OK", "payload" : { "version" : "v1", "lastStatisticsComputationTime" : "2023-03-01T14:35:03.804Z", "graphSummary" : { "numNodes" : 3748, "numEdges" : 51300, "numNodeLabels" : 4, "numEdgeLabels" : 2, "nodeLabels" : [ "continent", "country", "version", "airport" ], "edgeLabels" : [ "contains", "route" ], "numNodeProperties" : 14, "numEdgeProperties" : 1, "nodeProperties" : [ { "desc" : 3748 }, { "code" : 3748 }, { "type" : 3748 }, { "country" : 3503 }, { "longest" : 3503 }, { "city" : 3503 }, { "lon" : 3503 }, { "elev" : 3503 }, { "icao" : 3503 }, { "region" : 3503 }, { "runways" : 3503 }, { "lat" : 3503 }, { "date" : 1 }, { "author" : 1 } ], "edgeProperties" : [ { "dist" : 50532 } ], "totalNodePropertyValues" : 42773, "totalEdgePropertyValues" : 50532, "nodeStructures" : [ { "count" : 3471, "nodeProperties" : [ "city", "code", "country", "desc", "elev", "icao", "lat", "lon", "longest", "region", "runways", "type" ], "distinctOutgoingEdgeLabels" : [ "route" ] }, { "count" : 161, "nodeProperties" : [ "code", "desc", "type" ], "distinctOutgoingEdgeLabels" : [ "contains" ] }, { "count" : 83, "nodeProperties" : [ "code", "desc", "type" ], "distinctOutgoingEdgeLabels" : [ ] }, { "count" : 32, "nodeProperties" : [ "city", "code", "country", "desc", "elev", "icao", "lat", "lon", "longest", "region", "runways", "type" ], "distinctOutgoingEdgeLabels" : [ ] }, { "count" : 1, "nodeProperties" : [ "author", "code", "date", "desc", "type" ], "distinctOutgoingEdgeLabels" : [ ] } ], "edgeStructures" : [ { "count" : 50532, "edgeProperties" : [ "dist" ] } ] } } }
Sample RDF graph summary response
Here is the detailed summary response for an RDF graph that contains the sample
RDF air routes dataset
{ "status" : "200 OK", "payload" : { "version" : "v1", "lastStatisticsComputationTime" : "2023-03-01T14:54:13.903Z", "graphSummary" : { "numDistinctSubjects" : 54403, "numDistinctPredicates" : 19, "numQuads" : 158571, "numClasses" : 4, "classes" : [ "http://kelvinlawrence.net/air-routes/class/Version", "http://kelvinlawrence.net/air-routes/class/Airport", "http://kelvinlawrence.net/air-routes/class/Continent", "http://kelvinlawrence.net/air-routes/class/Country" ], "predicates" : [ { "http://kelvinlawrence.net/air-routes/objectProperty/route" : 50656 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/dist" : 50656 }, { "http://kelvinlawrence.net/air-routes/objectProperty/contains" : 7004 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/code" : 3747 }, { "http://www.w3.org/2000/01/rdf-schema#label" : 3747 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/type" : 3747 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/desc" : 3747 }, { "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" : 3747 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/icao" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/lat" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/region" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/runways" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/longest" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/elev" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/lon" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/country" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/city" : 3502 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/author" : 1 }, { "http://kelvinlawrence.net/air-routes/datatypeProperty/date" : 1 } ], "subjectStructures" : [ { "count" : 50656, "predicates" : [ "http://kelvinlawrence.net/air-routes/datatypeProperty/dist" ] }, { "count" : 3471, "predicates" : [ "http://kelvinlawrence.net/air-routes/datatypeProperty/city", "http://kelvinlawrence.net/air-routes/datatypeProperty/code", "http://kelvinlawrence.net/air-routes/datatypeProperty/country", "http://kelvinlawrence.net/air-routes/datatypeProperty/desc", "http://kelvinlawrence.net/air-routes/datatypeProperty/elev", "http://kelvinlawrence.net/air-routes/datatypeProperty/icao", "http://kelvinlawrence.net/air-routes/datatypeProperty/lat", "http://kelvinlawrence.net/air-routes/datatypeProperty/lon", "http://kelvinlawrence.net/air-routes/datatypeProperty/longest", "http://kelvinlawrence.net/air-routes/datatypeProperty/region", "http://kelvinlawrence.net/air-routes/datatypeProperty/runways", "http://kelvinlawrence.net/air-routes/datatypeProperty/type", "http://kelvinlawrence.net/air-routes/objectProperty/route", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2000/01/rdf-schema#label" ] }, { "count" : 238, "predicates" : [ "http://kelvinlawrence.net/air-routes/datatypeProperty/code", "http://kelvinlawrence.net/air-routes/datatypeProperty/desc", "http://kelvinlawrence.net/air-routes/datatypeProperty/type", "http://kelvinlawrence.net/air-routes/objectProperty/contains", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2000/01/rdf-schema#label" ] }, { "count" : 31, "predicates" : [ "http://kelvinlawrence.net/air-routes/datatypeProperty/city", "http://kelvinlawrence.net/air-routes/datatypeProperty/code", "http://kelvinlawrence.net/air-routes/datatypeProperty/country", "http://kelvinlawrence.net/air-routes/datatypeProperty/desc", "http://kelvinlawrence.net/air-routes/datatypeProperty/elev", "http://kelvinlawrence.net/air-routes/datatypeProperty/icao", "http://kelvinlawrence.net/air-routes/datatypeProperty/lat", "http://kelvinlawrence.net/air-routes/datatypeProperty/lon", "http://kelvinlawrence.net/air-routes/datatypeProperty/longest", "http://kelvinlawrence.net/air-routes/datatypeProperty/region", "http://kelvinlawrence.net/air-routes/datatypeProperty/runways", "http://kelvinlawrence.net/air-routes/datatypeProperty/type", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2000/01/rdf-schema#label" ] }, { "count" : 6, "predicates" : [ "http://kelvinlawrence.net/air-routes/datatypeProperty/code", "http://kelvinlawrence.net/air-routes/datatypeProperty/desc", "http://kelvinlawrence.net/air-routes/datatypeProperty/type", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2000/01/rdf-schema#label" ] }, { "count" : 1, "predicates" : [ "http://kelvinlawrence.net/air-routes/datatypeProperty/author", "http://kelvinlawrence.net/air-routes/datatypeProperty/code", "http://kelvinlawrence.net/air-routes/datatypeProperty/date", "http://kelvinlawrence.net/air-routes/datatypeProperty/desc", "http://kelvinlawrence.net/air-routes/datatypeProperty/type", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2000/01/rdf-schema#label" ] } ] } } }
Using AWS Identity and Access Management (IAM) authentication with graph summary endpoints
You can access graph summary endpoints securely with IAM authentication by using
awscurl
awscurl "$GRAPH_SUMMARY_ENDPOINT" \ --region
(your region)
\ --service neptune-db
Important
The IAM identity or role that creates the temporary credentials must have an IAM policy attached that allows the GetGraphSummary IAM action.
See IAM Authentication Errors for a list of common IAM errors that you may encounter.
Common error codes that a graph summary request may return
Neptune service error code | HTTP status | Message | Error Scenario | Mitigation |
---|---|---|---|---|
|
403 |
Missing Authentication Token. |
Unsigned or incorrectly signed request was sent to Neptune database with IAM enabled. |
Sign the request with SigV4 before sending (see IAM and graph summaries). |
403 |
User: |
IAM policy does not allow the action GetGraphSummary when the graph summary request was sent to Neptune database with IAM enabled. |
Make sure that the IAM policy attached to the user or role making the request
allows the |
|
|
400 |
Statistics are disabled, so graph summary is also disabled. |
Trying to fetch summary on burstable instance types
( |
Use an instance type where statistics generation is enabled
(all supported instances except |
400 |
Bad route: |
Request sent to invalid path. |
Use correct route for graph summary endpoint. |
|
|
400 |
Request contains unknown parameters: ' |
When an invalid parameter is specified in the request. |
Only use valid parameters (such as |
|
400 |
URI query parameter 'mode' has unsupported value ' |
When the URL parameter 'mode' in the request is followed by an invalid value. |
Use valid values (such as |
|
405 |
Method Not Allowed. |
Calling summary endpoint with any HTTP method other than
|
Use HTTP |
|
400 |
Statistics are not computed yet, graph summary will be available after statistics computation is complete. |
There are no statistics available when the request is sent to the summary endpoint. |
Wait until statistics generation is complete. You can check the status of statistics generation using the statistics status API. |
400 |
Statistics limit reached, thus graph summary is not available. |
Statistics generation has stopped because it reached statistics size limits. |
Graph summary is not available on this graph. |
For example, if you make a request to graph summary endpoint in a Neptune database that has IAM authentication enabled, and the necessary permissions are not present in the requestor’s IAM policy, then you would get a response like the following:
{ "detailedMessage": "User: arn:aws:iam::
(account ID)
:(user or user name)
is not authorized to perform: neptune-db:GetGraphSummary on resource: arn:aws:neptune-db:(region)
:(account ID)
:(cluster resource ID)
/*", "requestId": "7ac2b98e-b626-d239-1d05-74b4c88fce82", "code": "AccessDeniedException" }