Amazon EMR components
Amazon EMR enables fine-grained access control with Lake Formation through the following components:
-
Proxy agent - The proxy agent is based on Apache Knox. It receives SAML-authenticated requests from users and translates SAML claims to temporary credentials. It also stores the temporary credentials in the secret agent. The proxy agent runs on the master node as the
knox
system user and writes logs to the/var/log/knox
directory. -
Secret agent - The secret agent securely stores secrets and distributes secrets to other EMR components or applications. The secrets can include temporary user credentials, encryption keys, or Kerberos tickets. The secret agent runs on every node in the cluster and uses Lake Formation and AWS Glue APIs to retrieve temporary credentials and AWS Glue Data Catalog metadata. The secret agent runs as the
emrsecretagent
user, and writes logs to the/emr/secretagent/log
directory. The process relies on a specific set ofiptables
rules to function. It is important to ensureiptables
is not disabled, and, if you customizeiptables
configuration, thenat
table rules must be preserved and left unaltered. -
Record server - The record server receives requests to access data. It then authorizes requests based on temporary credentials and table access control policies distributed by the secret agent. The record server reads data from Amazon S3 and returns column-level data that the user is authorized to access. The record server runs on every node in the cluster as the
emr_record_server
user and writes logs to the/var/log/emr-record-server
directory.