How the SPARQL query engine works in Neptune - Amazon Neptune

How the SPARQL query engine works in Neptune

To use the information that the SPARQL explain feature provides, you need to understand some details about how the Amazon Neptune SPARQL query engine works.

The engine translates every SPARQL query into a pipeline of operators. Starting from the first operator, intermediate solutions known as binding lists flow through this operator pipeline. You can think of a binding list as a table in which the table headers are a subset of the variables used in the query. Each row in the table represents a result, up to the point of evaluation.

Let's assume that two namespace prefixes have been defined for our data:

@prefix ex: <> . @prefix foaf: <> .

The following would be an example of a simple binding list in this context:

?person | ?firstName ------------------------------------------------------ ex:JaneDoe | "Jane" ex:JohnDoe | "John" ex:RichardRoe | "Richard"

For each of three people, the list binds the ?person variable to an identifier of the person, and the ?firstName variable to the person's first name.

In the general case, variables can remain unbound, if, for example, there is an OPTIONAL selection of a variable in a query for which no value is present in the data.

The PipelineJoin operator is an example of a Neptune query engine operator present in the explain output. It takes as input an incoming binding set from the previous operator and joins it against a triple pattern, say (?person, foaf:lastName, ?lastName). This operation uses the bindings for the ?person variable in its input stream, substitutes them into the triple pattern, and looks up triples from the database.

When executed in the context of the incoming bindings from the previous table, PipelineJoin would evaluate three lookups, namely the following:

(ex:JaneDoe, foaf:lastName, ?lastName) (ex:JohnDoe, foaf:lastName, ?lastName) (ex:RichardRoe, foaf:lastName, ?lastName)

This approach is called as-bound evaluation. The solutions from this evaluation process are joined back against the incoming solutions, padding the detected ?lastName in the incoming solutions. Assuming that you find a last name for all three persons, the operator would produce an outgoing binding list that would look something like this:

?person | ?firstName | ?lastName --------------------------------------- ex:JaneDoe | "Jane" | "Doe" ex:JohnDoe | "John" | "Doe" ex:RichardRoe | "Richard" | "Roe"

This outgoing binding list then serves as input for the next operator in the pipeline. At the end, the output of the last operator in the pipeline defines the query result.

Operator pipelines are often linear, in the sense that every operator emits solutions for a single connected operator. However, in some cases, they can have more complex structures. For example, a UNION operator in a SPARQL query is mapped to a Copy operation. This operation duplicates the bindings and forwards the copies into two subplans, one for the left side and the other for the right side of the UNION.

For more information about operators, see Neptune SPARQL explain operators.