Using the Gremlin explain API in Neptune - Amazon Neptune

Using the Gremlin explain API in Neptune

The Amazon Neptune Gremlin explain API returns the query plan that would be executed if a specified query were run. Because the API doesn't actually run the query, the plan is returned almost instantaneously.

It differs from the TinkerPop .explain() step so as to be able to report information specific to the Neptune engine.

Information contained in a Gremlin explain report

An explain report contains the following information:

  • The query string as requested.

  • The original traversal. This is the TinkerPop Traversal object produced by parsing the query string into TinkerPop steps. It is equivalent to the original query produced by running .explain() on the query against the TinkerPop TinkerGraph.

  • The converted traversal. This is the Neptune Traversal produced by converting the TinkerPop Traversal into the Neptune logical query plan representation. In many cases the entire TinkerPop traversal is converted into two Neptune steps: one that executes the entire query (NeptuneGraphQueryStep) and one that converts the Neptune query engine output back into TinkerPop Traversers (NeptuneTraverserConverterStep).

  • The optimized traversal. This is the optimized version of the Neptune query plan after it has been run through a series of static work-reducing optimizers that rewrite the query based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.

  • The predicate count. Because of the Neptune indexing strategy described earlier, having a large number of different predicates can cause performance problems. This is especially true for queries that use reverse traversal operators with no edge label (.in or .both). If such operators are used and the predicate count is high enough, the explain report displays a warning message.

  • DFE information. When the DFE alternative engine is enabled, the following traversal components may show up in the optimized traversal:

    • DFENode   –   a node in the traversal that can contain one or more child DFEJoinGroupNodes. A DFENode represents a part of the query plan that will be executed in DFE.

    • DFEJoinGroupNode   –   Represents a join of one or more DFEPatternNodes.

    • DFEPatternNode   –   Encapsulates an underlying pattern using which matching tuples are projected out of the underlying database.

    An easy way to find out if your query is being evaluated by DFE is to check whether the explain output contains a DFENode. Any part of the traversal that is not part of the DFENode will not be executed by DFE and will continue to be executed by the existing Neptune Gremlin query engine.

    See Example with DFE enabled for a sample report.

Gremlin explain syntax

The syntax of the explain API is the same as that for the HTTP API for query, except that it uses /gremlin/explain as the endpoint instead of /gremlin, as in the following example.

curl -X POST https://your-neptune-endpoint:port/gremlin/explain -d '{"gremlin":"g.V().limit(1)"}'

The preceding query would produce the following output.

******************************************************* Neptune Gremlin Explain ******************************************************* Query String ============ g.V().limit(1) Original Traversal ================== [GraphStep(vertex,[]), RangeGlobalStep(0,1)] Converted Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .] }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3} }, NeptuneTraverserConverterStep ] Optimized Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY} }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3} }, NeptuneTraverserConverterStep ] Predicates ========== # of predicates: 18

Unconverted TinkerPop Steps

Ideally, all TinkerPop steps in a traversal have native Neptune operator coverage. When this isn't the case, Neptune falls back on TinkerPop step execution for gaps in its operator coverage. If a traversal uses a step for which Neptune does not yet have native coverage, the explain report displays a warning showing where the gap occurred.

When a step without a corresponding native Neptune operator is encountered, the entire traversal from that point forward is run using TinkerPop steps, even if subsequent steps do have native Neptune operators.

The exception to this is when Neptune full-text search is invoked. The NeptuneSearchStep implements steps without native equivalents as full-text search steps.

Example of explain output where all steps in a query have native equivalents

The following is an example explain report for a query where all steps have native equivalents:

******************************************************* Neptune Gremlin Explain ******************************************************* Query String ============ g.V().out() Original Traversal ================== [GraphStep(vertex,[]), VertexStep(OUT,vertex)] Converted Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .] PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .] PatternNode[(?3, <~label>, ?4, <~>) . project ask .] }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7} }, NeptuneTraverserConverterStep ] Optimized Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY} }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7} }, NeptuneTraverserConverterStep ] Predicates ========== # of predicates: 18

Example where some steps in a query do not have native equivalents

Neptune handles both GraphStep and VertexStep natively, but if you introduce a FoldStep and UnfoldStep, the resulting explain output is different:

******************************************************* Neptune Gremlin Explain ******************************************************* Query String ============ g.V().fold().unfold().out() Original Traversal ================== [GraphStep(vertex,[]), FoldStep, UnfoldStep, VertexStep(OUT,vertex)] Converted Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .] }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3} }, NeptuneTraverserConverterStep ] + not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)] Optimized Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY} }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3} }, NeptuneTraverserConverterStep, NeptuneMemoryTrackerStep ] + not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)] WARNING: >> FoldStep << is not supported natively yet

In this case, the FoldStep breaks you out of native execution. But even the subsequent VertexStep is no longer handled natively because it appears downstream of the Fold/Unfold steps.

For performance and cost-savings, it's important that you try to formulate traversals so that the maximum amount of work possible is done natively inside the Neptune query engine, instead of by the TinkerPop step implementations.

Example of a query that uses Neptune full-text-search

The following query uses Neptune full-text search:

g.withSideEffect("Neptune#fts.endpoint", "some_endpoint") .V() .tail(100) .has("Neptune#fts mark*") ------- .has("name", "Neptune#fts mark*") .has("Person", "name", "Neptune#fts mark*")

The .has("name", "Neptune#fts mark*") part limits the search to vertexes with name, while .has("Person", "name", "Neptune#fts mark*") limits the search to vertexes with name and the label Person. This results in the following traversal in the explain report:

Final Traversal [NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, termid(1,URI), ?2, termid(0,URI)) . project distinct ?1 .], {estimatedCardinality=INFINITY} }, annotations={path=[Vertex(?1):GraphStep], maxVarId=4} }, NeptuneTraverserConverterStep, NeptuneTailGlobalStep(10), NeptuneTinkerpopTraverserConverterStep, NeptuneSearchStep { JoinGroupNode { SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint} } JoinGroupNode { SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint} } }]

Example of using explain when the DFE is enabled

The following is an example of an explain report when the DFE alternative query engine is enabled:

******************************************************* Neptune Gremlin Explain ******************************************************* Query String ============ g.V("v1").out("created").in("created").id() Original Traversal ================== [GraphStep(vertex,[v1]), VertexStep(OUT,[created],vertex), VertexStep(IN,[created],vertex), IdStep] Converted Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(VertexId) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 . ContainsFilter(?1 in (<v1>)) .] PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . ContainsFilter(?5 in (<created>)) .] PatternNode[(?3, <~label>, ?4, <~>) . project ask .] PatternNode[(?7, ?9, ?3, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) . ContainsFilter(?9 in (<created>)) .] PatternNode[(?7, <~label>, ?8, <~>) . project ask .] }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, VertexId(?7):IdStep], maxVarId=11} }, NeptuneTraverserConverterStep ] Optimized Traversal =================== Neptune steps: [ NeptuneGraphQueryStep(VertexId) { JoinGroupNode { DFENode { DFEJoinGroupNode[ children={ DFEJoinGroupNode[ children={ DFEPatternNode[(?1, ?5, ?3, ?6) . project ALL[?1, ?3] subjectFilters=(in(<v1>) . ), predicateFilters=(in(<created>) . ), graphFilters=(!= <~> . ), {rangeCountEstimate=1}] }, ], DFEJoinGroupNode[ children={ DFEPatternNode[(?7, ?9, ?3, ?10) . project ALL[?3, ?7] predicateFilters=(in(<created>) . ), graphFilters=(!= <~> . ), {rangeCountEstimate=3}] }, ] }, ] } }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, VertexId(?7):IdStep], maxVarId=11} }, NeptuneTraverserConverterStep ] Predicates ========== # of predicates: 6

See Information in explain for a description of the DFE-specific sections in the report.