Retrieving external data for a PDP in OPA
For OPA, if all data required for an authorization decision can be provided as input or as part of a JSON Web Token (JWT) passed as a component of the query, no additional configuration is required. (It is relatively simple to pass JWTs and SaaS context data to OPA as part of query input.) OPA can accept arbitrary JSON input in what is called the overload input approach. If a PDP requires data beyond what can be included as input or a JWT, OPA provides several options for retrieving this data. These include bundling, pushing data (replication), and dynamic data retrieval.
OPA bundling
The OPA bundling feature supports the following process for external data retrieval:
-
The policy enforcement point (PEP) requests an authorization decision.
-
OPA downloads new policy bundles, including external data.
-
The bundling service replicates data from data source(s).
When you use the bundling feature, OPA periodically downloads policy and data bundles from a centralized bundle service. (OPA doesn't provide the implementation and setup of a bundle service.) All policies and external data that are pulled from the bundle service are stored in memory. This option will not work if the external data size is too large to be stored in memory, or if the data changes too frequently.
For more information about the bundling feature, see the OPA documentation
OPA replication (pushing data)
The OPA replication approach supports the following process for external data retrieval:
-
The PEP requests an authorization decision.
-
The data replicator pushes data to OPA.
-
The data replicator replicates data from data source(s).
In this alternative to the bundling approach, data is pushed to, instead of being periodically pulled by, OPA. (OPA doesn't provide the implementation and setup of a replicator.) The push approach has the same data size limitations as the bundling approach, because OPA stores all the data in memory. The primary advantage of the push option is that you can update data in OPA with deltas instead of replacing all the external data each time. This makes the push option more appropriate for datasets that change frequently.
For more information about the replication option, see the OPA documentation
OPA dynamic data retrieval
If the external data to be retrieved is too large to be cached in OPA's memory, the data can be dynamically pulled from an external source during the evaluation of an authorization decision. When you use this approach, data is always up to date. This approach has two drawbacks: network latency and accessibility. Currently, OPA can retrieve data at runtime only through an HTTP request. If the calls that go to an external data source cannot return data as an HTTP response, they require a custom API or some other mechanism to provide this data to OPA. Because OPA can retrieve data only through HTTP requests, and the speed of retrieving the data is pivotal, we recommend that you use an AWS service such as Amazon DynamoDB to hold external data when possible.
For more information about the pull approach, see the OPA documentation
Using an authorization service for implementation with OPA
When you fetch external data by using bundling, replication, or a dynamic pull approach, we recommend that the authorization service facilitate this interaction. This is because the authorization service can retrieve external data and transform it into JSON for OPA to make authorization decisions. The following diagram shows how an authorization service can function with these three external data retrieval approaches.

Retrieving external data for OPA flow – bundle or dynamic data retrieval at decision time (illustrated with red numbered callouts in the diagram):
-
OPA calls the local API endpoint for the authorization service, which is configured as a bundle endpoint or the endpoint for dynamic data retrieval during authorization decisions.
-
The authorization service queries or calls the external data source to retrieve external data. (For a bundle endpoint, this data should also contain OPA policies and rules. Bundle updates replace everything—both data and policies—in OPA's cache.)
-
The authorization service performs any transformation necessary on the returned data to turn it into the expected JSON input.
-
The data is returned to OPA. It is cached in memory for bundle configuration and used immediately for dynamic authorization decisions.
Retrieving external data for OPA flow – replicator (illustrated with blue numbered callouts in the diagram):
-
The replicator (part of the authorization service) calls the external data source and retrieves any data to be updated in OPA. This can include policies, rules, and external data. This call can be on a set cadence, or it can happen in response to data updates in the external source.
-
The authorization service performs any transformations necessary on the returned data to turn it into the expected JSON input.
-
The authorization service calls OPA and caches the data in memory. The authorization service can selectively update data, policies, and rules.