Neptune Data Model for Elasticsearch Data - Amazon Neptune

Neptune Data Model for Elasticsearch Data

Amazon Neptune uses a unified JSON document structure for storing both SPARQL and Gremlin data in Elasticsearch. Each document in Elasticsearch corresponds to an entity and stores all the relevant information for that entity. For Gremlin, vertexes and edges are considered entities, so the corresponding Elasticsearch documents have information about vertexes, labels, and properties. For SPARQL, subjects can be considered entities, so corresponding Elasticsearch documents have information about all the predicate-object pairs in one document.


The Neptune-to-Elasticsearch replication implementation only stores string data. However, you can modify it to store other data types.

The unified JSON document structure looks like the following.

{ "entity_id": "Vertex Id/Edge Id/Subject URI", "entity_type": [List of Labels/rdf:type object value], "document_type": "vertex/edge/rdf-resource" "predicates": { "Property name or predicate URI": [ { "value": "Property Value or Object Value", "graph": "(Only for Sparql) Named Graph Quad is present" "language": "(Only for Sparql) rdf:langString" }, { "value": "Property Value 2/ Object Value 2", } ] } }

  • entity_id – Entity unique ID representing the document.

    • For SPARQL, this is the subject URI.

    • For Gremlin, this is the Vertex_ID or Edge_ID.

  • entity_type – Represents one or more labels for a vertex or edge, or zero or more rdf:type predicate values for a subject.

  • document_type – Used to specify whether the current document represents a vertex, edge, or rdf-resource.

  • predicates – For Gremlin, stores properties and values for a vertex or edge. For SPARQL, it stores predicate-object pairs.

    The property name takes the form in Elasticsearch. To query it, you have to name it in that form.

  • value  – A property value for Gremlin or an object value for SPARQL.

  • graph – A named graph for SPARQL.

  • language – A language tag for a rdf:langString literal in SPARQL.

Sample SPARQL Elasticsearch Document


@prefix dt: <> . @prefix ex: <> . @prefix xsd: <> . @prefix rdf: <> . ex:simone rdf:type ex:Person ex:g1 ex:michael rdf:type ex:Person ex:g1 ex:simone ex:likes "spaghetti" ex:g1 ex:simone ex:knows ex:michael ex:g2 # Not stored in ES ex:simone ex:likes "spaghetti" ex:g2 ex:simone ex:status "La vita è un sogno"@it ex:g2 ex:simone ex:age "40"^^xsd:int DG # Not stored in ES ex:simone ex:dummy "testData"^^dt:newDataType DG ex:simone ex:hates _:bnode # Not stored in ES _:bnode ex:means "coding" DG # Not stored in ES


{ "entity_id": "", "entity_type": [""], "document_type": "rdf-resource" "predicates": { "": [ { "value": "spaghetti", "graph": "" }, { "value": "spaghetti", "graph": "" } ] "": [ { "value": "La vita è un sogno", "language": "it" // Only present for rdf:langString } ] } }
{ "entity_id" : "", "entity_type" : [""], "document_type": "rdf-resource" }

Sample Gremlin Elasticsearch Document


# Vertex 1 simone label Person <== Label simone likes "spaghetti" <== Property simone likes "rice" <== Property simone age 40 <== Property # Vertex 2 michael label Person <== Label # Edge 1 simone knows michael <== Edge e1 updated "2019-07-03" <== Edge Property e1 through "company" <== Edge Property e1 since 10 <== Edge Property


{ "entity_id": "simone", "entity_type": ["Person"], "document_type": "vertex", "predicates": { "likes": [ { "value": "spaghetti" }, { "value": "rice" } ] } }
{ "entity_id" : "michael", "entity_type" : ["Person"], "document_type": "vertex" }
{ "entity_id": "e1", "entity_type": ["knows"], "document_type": "edge" "predicates": { "through": [ { "value": "company" } ] } }