Set up a minimum viable data space to share data between organizations - AWS Prescriptive Guidance

Set up a minimum viable data space to share data between organizations

Created by Ramy Hcini (Think-it), Ismail Abdellaoui (Think-it), Malte Gasseling (Think-it), Jorge Hernandez Suarez (AWS), and Michael Miller (AWS)

Environment: PoC or pilot

Technologies: Analytics; Containers & microservices; Data lakes; Databases; Infrastructure

Workload: Open-source

AWS services: Amazon Aurora; AWS Certificate Manager (ACM); AWS CloudFormation; Amazon EC2; Amazon EFS; Amazon EKS; Elastic Load Balancing (ELB); Amazon RDS; Amazon S3; AWS Systems Manager

Summary

Data spaces are federated networks for data exchange with trust and control over one's data as core principles. They enable organizations to share, exchange, and collaborate on data at scale by offering a cost-effective and technology-agnostic solution.

Data spaces have the potential to significantly drive efforts for a sustainable future by using data-driven problem solving with an end-to-end approach that involves all relevant stakeholders.

This pattern guides you through the example of how two companies can use data space technology on Amazon Web Services (AWS) to drive their carbon emissions‒reduction strategy forward. In this scenario, company X provides carbon-emissions data, which company Y consumes. See the Additional information section for the following data space specification details:

  • Participants

  • Business case

  • Data space authority

  • Data space components

  • Data space services

  • Data to be exchanged

  • Data model

  • Tractus-X EDC connector

The pattern includes steps for the following:

  • Deploying the infrastructure needed for a basic data space with two participants running on AWS.

  • Exchanging carbon emissions‒intensity data by using the connectors in a secure way.

This pattern deploys a Kubernetes cluster that will host data space connectors and their services through Amazon Elastic Kubernetes Service (Amazon EKS).

The Eclipse Dataspace Components (EDC) control plane and data plane are both deployed on Amazon EKS. The official Tractus-X Helm chart deploys PostgreSQL and HashiCorp Vault services as dependencies.

In addition, the identity service is deployed on Amazon Elastic Compute Cloud (Amazon EC2) to replicate a real-life scenario of a minimum viable data space (MVDS).

Prerequisites and limitations

Prerequisites

  • An active AWS account to deploy the infrastructure in your chosen AWS Region

  • An AWS Identity and Access Management (IAM) user with access to Amazon S3 that will be used temporarily as a technical user (The EDC connector currently doesn't support the use of roles. We recommend that you create one IAM user specifically for this demo and that this user will have limited permissions associated with it.)

  • AWS Command Line Interface (AWS CLI) installed and configured in your chosen AWS Region

  • AWS security credentials

  • eksctl on your workstation

  • Git on your workstation

  • kubectl

  • Helm

  • Postman

  • An AWS Certificate Manager (ACM) SSL/TLS certificate

  • A DNS name that will point to an Application Load Balancer (the DNS name must be covered by the ACM certificate)

  • HashiCorp Vault (For information about using AWS Secrets Manager to manage secrets, see the Additional information section.)

Product versions

Limitations

  • Connector selection ‒ This deployment uses an EDC-based connector. However, be sure to consider the strengths and functionalities of both the EDC and FIWARE True connectors to make an informed decision that aligns with the specific needs of the deployment.

  • EDC connector build ‒ The chosen deployment solution relies on the Tractus-X EDC Connector Helm chart, a well-established and extensively tested deployment option. The decision to use this chart is driven by its common usage and the inclusion of essential extensions in the provided build. While PostgreSQL and HashiCorp Vault are default components, you have the flexibility to customize your own connector build if needed.

  • Private cluster access ‒ Access to the deployed EKS cluster is restricted to private channels. Interaction with the cluster is performed exclusively through the use of kubectl and IAM. Public access to the cluster resources can be enabled by using load balancers and domain names, which must be implemented selectively to expose specific services to a broader network. However, we do not recommend providing public access.

  • Security focus ‒ Emphasis is placed on abstracting security configurations to default specifications so that you can concentrate on the steps involved in EDC connector data exchange. Although default security settings are maintained, it's imperative to enable secure communications before you expose the cluster to the public network. This precaution ensures a robust foundation for secure data handling.

  • Infrastructure cost ‒ An estimation of the infrastructure’s cost can be found by using the AWS Pricing Calculator. A simple calculation shows that costs can be up to 162.92 USD per month for the deployed infrastructure.

Architecture

The MVDS architecture comprises two virtual private clouds (VPCs), one for the Dynamic Attribute Provisioning System (DAPS) identity service and one for Amazon EKS.

DAPS architecture

The following diagram shows DAPS running on EC2 instances controlled by an Auto Scaling group. An Application Load Balancer and route table expose the DAPS servers. Amazon Elastic File System (Amazon EFS) synchronizes the data among the DAPS instances.

AWS Cloud architecture with VPC, availability zones, subnets, and DAPS servers in an auto-scaling group.

Amazon EKS architecture

Data spaces are designed to be technology-agnostic solutions, and multiple implementations exist. This pattern uses an Amazon EKS cluster to deploy the data space technical components. The following diagram shows the deployment of the EKS cluster. Worker nodes are installed in private subnets. The Kubernetes pods access the Amazon Relational Database Service (Amazon RDS) for PostgreSQL instance that is also in the private subnets. The Kubernetes pods store shared data in Amazon S3.

AWS Cloud architecture with VPC, public and private subnets, NAT gateways, and Kubernetes nodes across two availability zones.

Tools

AWS services

Other tools

  • eksctl is a command-line utility for creating and managing Kubernetes clusters on Amazon EKS.

  • Git is an open source, distributed version control system.

  • HashiCorp Vault provides secure storage with controlled access for credentials and other sensitive information.

  • Helm is an open source package manager for Kubernetes that helps you install and manage applications on your Kubernetes cluster.

  • kubectl is a command-line interface that helps you run commands against Kubernetes clusters.

  • Postman is an API platform.

Code repository

The Kubernetes configuration YAML files and Python scripts for this pattern are available in the GitHub aws-patterns-edc repository. The pattern also uses the Tractus-X EDC repository.

Best practices

Amazon EKS and isolation of participants’ infrastructures

Namespaces in Kubernetes will separate the company X provider’s infrastructure from the company Y consumer’s infrastructure in this pattern. For more information, see EKS Best Practices Guides.

In a more realistic situation, each participant would have a separate Kubernetes cluster running within their own AWS account. The shared infrastructure (DAPS in this pattern) would be accessible by data space participants while being completely separated from participants' infrastructures.

Epics

TaskDescriptionSkills required

Clone the repository.

To clone the repository to your workstation, run the following command:

git clone https://github.com/Think-iT-Labs/aws-patterns-edc

The workstation must have access to your AWS account.

DevOps engineer

Provision the Kubernetes cluster and set up namespaces.

To deploy a simplified default EKS cluster in your account, run the following eksctl command on the workstation where you cloned the repo:

eksctl create cluster

The command creates the VPC and private and public subnets that span three different Availability Zones. After the network layer is created, the command creates two m5.large EC2 instances within an Auto Scaling group.

For more information and example output, see the eksctl guide.

After you provision the private cluster, add the new EKS cluster to your local Kubernetes configuration by running the following command:

aws eks update-kubeconfig --name <EKS CLUSTER NAME> --region <AWS REGION>

This pattern uses the eu-west-1 AWS Region to run all commands. However, you can run the same commands in your preferred AWS Region.

To confirm that your EKS nodes are running and are in the ready state, run the following command:

kubectl get nodes
DevOps engineer

Set up the namespaces.

To create namespaces for the provider and the consumer, run the following commands:

kubectl create ns provider kubectl create ns consumer

In this pattern, it's important to use provider and consumer as the namespaces to fit the configurations in the next steps.

DevOps engineer
TaskDescriptionSkills required

Deploy DAPS by using AWS CloudFormation.

For ease of managing DAPS operations, the DAPS server is installed on EC2 instances.

To install DAPS, use the AWS CloudFormation template. You will need the ACM certificate and DNS name from the Prerequisites section. The template deploys and configures the following:

  • Application Load Balancer

  • Auto Scaling group

  • EC2 instances configured with user data to install all necessary packages

  • IAM roles

  • DAPS

You can deploy the AWS CloudFormation template by signing in to the AWS Management Console and using the AWS CloudFormation console. You can also deploy the template by using an AWS CLI command such as the following:

aws cloudformation create-stack --stack-name daps \ --template-body file://aws-patterns-edc/cloudformation.yml --parameters \ ParameterKey=CertificateARN,ParameterValue=<ACM Certificate ARN> \ ParameterKey=DNSName,ParameterValue=<DNS name> \ ParameterKey=InstanceType,ParameterValue=<EC2 instance type> \ ParameterKey=EnvironmentName,ParameterValue=<Environment Name> --capabilities CAPABILITY_NAMED_IAM

The environment name is your own choice. We recommend using a meaningful term, such as DapsInfrastructure, because it will be reflected in the AWS resource tags.

For this pattern, t3.small is large enough to run the DAPS workflow, which has three Docker containers.

The template deploys the EC2 instances in private subnets. This means that the instances are not directly accessible through SSH (Secure Shell) from the internet. The instances are provisioned with the necessary IAM role and AWS Systems Manager Agent to enable access to the running instances through Session Manager, a capability of AWS Systems Manager.

We recommend using Session Manager for access. Alternatively, you could provision a bastion host to allow SSH access from the internet. When using the bastion host approach, the EC2 instance might take a few more minutes to start running.

After the AWS CloudFormation template is successfully deployed, point the DNS name to your Application Load Balancer DNS name. To confirm, run the following command:

dig <DNS NAME>

The output should be similar to the following:

; <<>> DiG 9.16.1-Ubuntu <<>> edc-pattern.think-it.io ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42344 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;edc-pattern.think-it.io. IN A ;; ANSWER SECTION: edc-pattern.think-it.io. 276 IN CNAME daps-alb-iap9zmwy3kn8-1328773120.eu-west-1.elb.amazonaws.com. daps-alb-iap9zmwy3kn8-1328773120.eu-west-1.elb.amazonaws.com. 36 IN A 52.208.240.129 daps-alb-iap9zmwy3kn8-1328773120.eu-west-1.elb.amazonaws.com. 36 IN A 52.210.155.124
DevOps engineer

Register the participants’ connectors to the DAPS service.

From within any of the EC2 instances provisioned for DAPS, register participants:

  1. Run the available script on the EC2 instance by using the root user:

    cd /srv/mvds/omejdn-daps
  2. Register the provider:

    bash scripts/register_connector.sh <provider_name>
  3. Register the consumer:

    bash scripts/register_connector.sh <consumer_name>

The choice of the names doesn't impact the next steps. We recommend using either provider and consumer or companyx and companyy.

The registration commands will also automatically configure the DAPS service with the needed information fetched from the created certificates and keys.

While you are logged in to a DAPS server, gather information needed for later steps in the installation:

  1. From omejdn-daps/config/clients.yml get the client id for the provider and the consumer. The client id values are long strings of hexadecimal digits.

  2. From the omejdn-daps/keys directory, copy the contents of the consumer.cert, consumer.key, provider.cert, and provider.key files.

We recommend copying and pasting the text into similarly named files prefixed with daps- on your workstation.

You should have the client IDs for the provider and consumer and should have four files in your working directory on your workstation:

  • Source file name consumer.cert becomes workstation file name daps-consumer.cert.

  • Source file name consumer.key becomes workstation file name daps-consumer.key.

  • Source file name provider.cert becomes workstation file name daps-provider.cert.

  • Source file name provider.key becomes workstation file name daps-provider.key.

DevOps engineer
TaskDescriptionSkills required

Clone the Tractus-X EDC repository and use the 0.4.1 version.

The Tractus-X EDC connector’s build requires PostgreSQL (assets database) and HashiCorp Vault (secrets management) services to be deployed and available.

There are many different versions of Tractus-X EDC Helm charts. This pattern specifies version 0.4.1 because it uses the DAPS server.

The latest versions use Managed Identity Wallet (MIW) with a distributed implementation of the identity service.

On the workstation where you created the two Kubernetes namespaces, clone the tractusx-edc repository, and check out the release/0.4.1 branch.

git clone https://github.com/eclipse-tractusx/tractusx-edc cd tractusx-edc git checkout release/0.4.1
DevOps engineer

Configure the Tractus-X EDC Helm chart.

Modify the Tractus-X Helm chart template configuration to enable both connectors to interact together.

To do this, you would add the namespace to the DNS name of the service so that it could be resolved by other services in the cluster. These modifications should be made to the charts/tractusx-connector/templates/_helpers.tpl file. This pattern provides a final modified version of this file for you to use. Copy it and put it in the daps section of the file charts/tractusx-connector/templates/_helpers.tpl.

Make sure to comment all DAPS dependencies in charts/tractusx-connector/Chart.yaml:

dependencies: # IDS Dynamic Attribute Provisioning Service (IAM) # - name: daps # version: 0.0.1 # repository: "file://./subcharts/omejdn" # alias: daps # condition: install.daps
DevOps engineer

Configure the connectors to use PostgreSQL on Amazon RDS.

(Optional) Amazon Relational Database Service (Amazon RDS) instance is not required in this pattern. However, we highly recommend using Amazon RDS or Amazon Aurora because they provide features such as high availability and backup and recovery.

To replace PostgreSQL on Kubernetes with Amazon RDS, do the following:

  1. Provision the Amazon RDS for PostgreSQL instance.

  2. In Chart.yaml, comment the PostgreSQL section.

  3. In provider_values.yml and consumer_values.yml, configure the postgresql section as follows:

postgresql: auth: database: edc password: <RDS PASSWORD> username: <RDS Username> jdbcUrl: jdbc:postgresql://<RDS DNS NAME>:5432/edc username: <RDS Username> password: <RDS PASSWORD> primary: persistence: enabled: false readReplicas: persistence: enabled: false
DevOps engineer

Configure and deploy the provider connector and its services.

To configure the provider connector and its services, do the following:

  1. To download the provider_edc.yaml file from the edc_helm_configs directory to the current Helm chart folder, run the following command:

    wget -q https://raw.githubusercontent.com/Think-iT-Labs/aws-patterns-edc/main/edc_helm_configs/provider_edc.yaml> -P charts/tractusx-connector/

  2. Replace the following variables (also marked in the file) with their values:

    • CLIENT_ID ‒ The ID generated by the DAPS. The CLIENT_ID should be in /srv/mvds/omejdn-daps/config/clients.yml/config/clients.yml on the DAPS server. It should a string of hexadecimal characters.

    • DAPS_URL ‒ The URL of the DAPS server. It should be https://{DNS name} using the DNS name that you set up when you ran the AWS CloudFormation template.

    • VAULT_TOKEN ‒ The token to be used for Vault authorization. Choose any value.

    • vault.fullnameOverridevault-provider.

    • vault.hashicorp.urlhttp://vault-provider:8200/.

    The previous values assume that the deployment name and the namespace name are provider.

  3. To run the Helm chart from your workstation, use the following commands:

    cd charts/tractusx-connector helm dependency build helm upgrade --install provider ./ -f provider_edc.yaml -n provider
DevOps engineer

Add the certificate and keys to the provider vault.

To avoid confusion, produce the following certificates outside of the tractusx-edc/charts directory.

For example, run the following command to change to your home directory:

cd ~

You now need to add the secrets that are needed by the provider into the vault.

The names of the secrets within the vault are the values of the keys in the secretNames: section of the provider_edc.yml file. By default, they are configured as follows:

secretNames: transferProxyTokenSignerPrivateKey: transfer-proxy-token-signer-private-key transferProxyTokenSignerPublicKey: transfer-proxy-token-signer-public-key transferProxyTokenEncryptionAesKey: transfer-proxy-token-encryption-aes-key dapsPrivateKey: daps-private-key dapsPublicKey: daps-public-key

An Advanced Encryption Standard (AES) key, private key, public key, and self-signed certificate are generated initially. These are subsequently added as secrets to the vault.

Furthermore, this directory should contain the daps-provider.cert and daps-provider.key files that you copied from the DAPS server.

  1. Run the following commands:

    # generate a private key openssl ecparam -name prime256v1 -genkey -noout -out provider-private-key.pem # generate corresponding public key openssl ec -in provider-private-key.pem -pubout -out provider-public-key.pem # create a self-signed certificate openssl req -new -x509 -key provider-private-key.pem -out provider-cert.pem -days 360 # generate aes key openssl rand -base64 32 > provider-aes.key
  2. Before adding the secrets to the vault, convert them from multiple lines to single lines by replacing line breaks with \n:

    cat provider-private-key.pem | sed 's/$/\\\\n/'|tr -d '\\n' > provider-private-key.pem.line cat provider-public-key.pem | sed 's/$/\\\\n/'|tr -d '\\n' > provider-public-key.pem.line cat provider-cert.pem | sed 's/$/\\\\n/'|tr -d '\\n' > provider-cert.pem.line cat provider-aes.key | sed 's/$/\\\\n/'|tr -d '\\n' > provider-aes.key.line ## The following block is for daps certificate and key openssl x509 -in daps-provider.cert -outform PEM | sed 's/$/\\\\n/'|tr -d '\\n' > daps-provider.cert.line cat daps-provider.key | sed 's/$/\\\\n/'|tr -d '\\n' > daps-provider.key.line
  3. To format the secrets that will be added to Vault, run the following commands:

    JSONFORMAT='{"content": "%s"}' #create a single line in JSON format printf "${JSONFORMAT}\\n" "`cat provider-private-key.pem.line`" > provider-private-key.json printf "${JSONFORMAT}\\n" "`cat provider-public-key.pem.line`" > provider-public-key.json printf "${JSONFORMAT}\\n" "`cat provider-cert.pem.line`" > provider-cert.json printf "${JSONFORMAT}\\n" "`cat provider-aes.key.line`" > provider-aes.json printf "${JSONFORMAT}\\n" "`cat daps-provider.key.line`" > daps-provider.key.json printf "${JSONFORMAT}\\n" "`cat daps-provider.cert.line`" > daps-provider.cert.json

    The secrets are now in JSON format and are ready to be added to the vault.

  4. To get the pod name for the vault, run the following command:

    kubectl get pods -n provider|egrep "vault|NAME"

    The pod name will be similar to "vault-provider-0". This name is used when creating a port forward to the vault. The port forward gives you to access the vault to add the secret. You should run this from a workstation that has AWS credentials configured.

  5. To access the vault, use kubectl to configure a port forward:

    kubectl port-forward <VAULT_POD_NAME> 8200:8200 -n provider

You should now be able to access the vault through your browser or the CLI.

Browser

  1. Using the browser, navigate to http://127.0.0.1:8200, which will use the port forward that you configured.

  2. Log in using the token you that configured previously in provider_edc.yml. In the secrets engine, create three secrets. Each secret will have a Path for this secret value, which is the secret name shown in the following list. Within the secret data section, the name of the key will be content and the value will be the single line of text from the respective file named .line.

  3. The secret names are sourced from the secretNames section in the provider_edc.yml file.

  4. Create the following secrets:

    • Secret transfer-proxy-token-signer-private-key with file name provider-private-key.pem.line

    • Secret transfer-proxy-token-signer-public-key with file name provider-cert.pem.line

    • Secret transfer-proxy-token-encryption-aes-key with file name provider-aes.key.line

    • Secret daps-private-key with file name daps-provider.key.line

    • Secret daps-public-key with file name daps-provider.cert.line

Vault CLI

The CLI will also use the port forward that you configured.

  1. On your workstation, install Vault CLI by following the instructions in the HashiCorp Vault documentation.

  2. To log in to the vault by using the token that you set up in provider_edc.yml, run the following command:

    vault login -address=http://127.0.0.1:8200

    With the correct token, you should see the message "Success! You are now authenticated."

  3. To create the secrets by using the JSON formatted files that you created previously, run the following code:

    vault kv put -address=http://127.0.0.1:8200 secret/transfer-proxy-token-signer-private-key @provider-private-key.json vault kv put -address=http://127.0.0.1:8200 secret/transfer-proxy-token-signer-public-key @provider-cert.json vault kv put -address=http://127.0.0.1:8200 secret/transfer-proxy-token-encryption-aes-key @provider-aes.json vault kv put -address=http://127.0.0.1:8200 secret/daps-private-key @daps-provider.key.json vault kv put -address=http://127.0.0.1:8200 secret/daps-public-key @daps-provider.cert.json
DevOps engineer

Configure and deploy the consumer connector and its services.

The steps for configuring and deploying the consumer are similar to those you completed for the provider:

  1. To copy the consumer_edc.yaml from the aws-patterns-edc repo into the tractusx-edc/charts/tractusx-connector folder, run the following commands:

    cd tractusx-edc wget -q https://raw.githubusercontent.com/Think-iT-Labs/aws-patterns-edc/main/edc_helm_configs/consumer_edc.yaml -P charts/tractusx-connector/
  2. Update the following variables with their actual values:

    • CONSUMER_CLIENT_ID ‒ The ID generated by DAPS. The CONSUMER_CLIENT_ID should be in config/clients.yml on the DAPS server.

    • DAPS_URL ‒ The same DAPS URL that you used for the provider.

    • VAULT_TOKEN ‒ The token to be used for Vault authorization. Choose any value.

    • vault.fullnameOverridevault-consumer

    • vault.hashicorp.urlhttp://vault-provider:8200/

    The previous values assume that the deployment name and the namespace name are consumer.

  3. To run the Helm chart, use the following commands:

    cd charts/tractusx-connector helm upgrade --install consumer ./ -f consumer_edc.yaml -n consumer

Add the certificate and keys to the consumer vault.

From a security standpoint, we recommend regenerating the certificates and keys for each data space participant. This pattern regenerates certificates and keys for the consumer.

The steps are very similar to those for the provider. You can verify the secret names in the consumer_edc.yml file.

The names of the secrets within the vault are the values of the keys in the secretNames: section of the consumer_edc.yml file. By default, they are configured as follows:

secretNames: transferProxyTokenSignerPrivateKey: transfer-proxy-token-signer-private-key transferProxyTokenSignerPublicKey: transfer-proxy-token-signer-public-key transferProxyTokenEncryptionAesKey: transfer-proxy-token-encryption-aes-key dapsPrivateKey: daps-private-key dapsPublicKey: daps-public-key

The daps-consumer.cert and daps-consumer.key files that you copied from the DAPS server should already exist in this directory.

  1. Run the following commands:

    # generate a private key openssl ecparam -name prime256v1 -genkey -noout -out consumer-private-key.pem # generate corresponding public key openssl ec -in consumer-private-key.pem -pubout -out consumer-public-key.pem # create a self-signed certificate openssl req -new -x509 -key consumer-private-key.pem -out consumer-cert.pem -days 360 # generate aes key openssl rand -base64 32 > consumer-aes.key
  2. Manually edit the files to replace line breaks with \n, or use three commands similar to the following:

    cat consumer-private-key.pem | sed 's/$/\\\\n/'|tr -d '\\n' > consumer-private-key.pem.line cat consumer-public-key.pem | sed 's/$/\\\\n/'|tr -d '\\n' > consumer-public-key.pem.line cat consumer-cert.pem | sed 's/$/\\\\n/'|tr -d '\\n' > consumer-cert.pem.line cat consumer-aes.key | sed 's/$/\\\\n/'|tr -d '\\n' > consumer-aes.key.line cat daps-consumer.cert | sed 's/$/\\\\n/'|tr -d '\\n' > daps-consumer.cert.line cat daps-consumer.key | sed 's/$/\\\\n/'|tr -d '\\n' > daps-consumer.key.line
  3. To format the secrets that will be added to Vault, run the following commands:

    JSONFORMAT='{"content": "%s"}' #create a single line in JSON format printf "${JSONFORMAT}\\n" "`cat consumer-private-key.pem.line`" > consumer-private-key.json printf "${JSONFORMAT}\\n" "`cat consumer-public-key.pem.line`" > consumer-public-key.json printf "${JSONFORMAT}\\n" "`cat consumer-cert.pem.line`" > consumer-cert.json printf "${JSONFORMAT}\\n" "`cat consumer-aes.key.line`" > consumer-aes.json printf "${JSONFORMAT}\\n" "`cat daps-consumer.key.line`" > daps-consumer.key.json printf "${JSONFORMAT}\\n" "`cat daps-consumer.cert.line`" > daps-consumer.cert.json

    The secrets are now in JSON format and are ready to be added to the vault.

  4. To get the pod name for the consumer vault, run the following command:

    kubectl get pods -n consumer | egrep "vault|NAME"

    The pod name will be similar to "vault-consumer-0". This name is used when creating a port forward to the vault. The port forward gives you to access the vault to add the secret. You should run this from a workstation that has AWS credentials configured.

  5. To access the vault, use kubectl to configure a port forward:

    kubectl port-forward <VAULT_POD_NAME> 8201:8200 -n consumer

The local port is 8201 this time so that you can have port forwards in place for both the producer and consumer.

Browser

You can use your browser to connect to http://localhost:8201/ to access the consumer vault and create the secrets with names and content as outlined.

The secrets and files that contain the content are the following:

  • Secret transfer-proxy-token-signer-private-key with file name consumer-private-key.pem.line

  • Secret transfer-proxy-token-signer-public-key with file name consumer-cert.pem.line

  • Secret transfer-proxy-token-encryption-aes-key with file name consumer-aes.key.line

Vault CLI

Using Vault CLI, you can run the following commands to log in to the vault and create the secrets:

  1. Log in to the vault by using the token that you configured within consumer_edc.yml:

    vault login -address=http://127.0.0.1:8201

    With the correct token, you should see the message "Success! You are now authenticated."

  2. To create the secrets using the JSON formatted files that you created previously, run the following code:

    vault kv put -address=http://127.0.0.1:8201 secret/transfer-proxy-token-signer-private-key @consumer-private-key.json vault kv put -address=http://127.0.0.1:8201 secret/transfer-proxy-token-signer-public-key @consumer-cert.json vault kv put -address=http://127.0.0.1:8201 secret/transfer-proxy-token-encryption-aes-key @consumer-aes.json vault kv put -address=http://127.0.0.1:8201 secret/daps-private-key @daps-consumer.key.json vault kv put -address=http://127.0.0.1:8201 secret/daps-public-key @daps-consumer.cert.json
DevOps engineer
TaskDescriptionSkills required

Set up port forwarding.

  1. To check the status of the pods, run the following commands:

    kubectl get pods -n provider kubectl get pods -n consumer
  2. To make sure that the Kubernetes deployments were successful, look at the logs of the provider and consumer Kubernetes pods by running the following commands:

    kubectl logs -n provider <producer control plane pod name> kubectl logs -n consumer <consumer control plane pod name>

The cluster is private and is not accessible publicly. To interact with the connectors, use the Kubernetes port-forwarding feature to forward traffic generated by your machine to the connector control plane.

  1. On the first terminal, forward the consumer’s requests to the management API through port 8300:

    kubectl port-forward deployment/consumer-tractusx-connector-controlplane 8300:8081 -n consumer
  2. On the second terminal, forward the provider’s requests to the management API through port 8400:

    kubectl port-forward deployment/provider-tractusx-connector-controlplane 8400:8081 -n provider
DevOps engineer

Create S3 buckets for the provider and the consumer.

The EDC connector currently doesn't use temporary AWS credentials, such as those provided by assuming a role. The EDC supports only the use of an IAM access key ID and secret access key combination.

Two S3 buckets are required for later steps. One S3 bucket is used for storing data made available by the provider. The other S3 bucket is for data received by the consumer.

The IAM user should have permission to read and write objects only in the two named buckets.

An access key ID and secret access key pair needs to be created and kept safe. After this MVDS has been decommissioned, the IAM user should be deleted.

The following code is an example IAM policy for the user:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1708699805237", "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:ListAllMyBuckets", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListBucketVersions", "s3:PutObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<S3 Provider Bucket>", "arn:aws:s3:::<S3 Consumer Bucket>", "arn:aws:s3:::<S3 Provider Bucket>/*", "arn:aws:s3:::<S3 Consumer Bucket>/*" ] } ] }
DevOps engineer

Set up Postman to interact with the connector.

You can now interact with the connectors through your EC2 instance. Use Postman as an HTTP client, and provide Postman Collections for both the provider and the consumer connectors.

Import the collections from the aws-pattern-edc repository into your Postman instance.

This pattern uses Postman collection variables to provide input to your requests.

App developer, Data engineer
TaskDescriptionSkills required

Prepare the carbon-emissions intensity data to be shared.

First you need to decide on the data asset to be shared. The data of company X represents the carbon-emissions footprint of its vehicle fleet. Weight is Gross Vehicle Weight (GVW) in tonnes, and emissions are in grams of CO2 per tonne-kilometer (g CO2 e/t-km) according to the Wheel-to-Well (WTW) measurement:

  • Vehicle type: Van; weight: < 3.5; emissions: 800

  • Vehicle type: Urban truck; weight: 3.5‒7.5; emissions: 315

  • Vehicle type: Medium goods vehicle (MGV); weight: 7.5‒20; emissions: 195

  • Vehicle type: Heavy goods vehicle (HGV); weight: > 20; emissions: 115

The example data is in the carbon_emissions_data.json file in the aws-patterns-edc repository.

Company X uses Amazon S3 to store objects.

Create the S3 bucket and store the example data object there. The following commands create an S3 bucket with default security settings. We highly recommend consulting Security best practices for Amazon S3.

aws s3api create-bucket <BUCKET_NAME> --region <AWS_REGION> # You need to add '--create-bucket-configuration # LocationConstraint=<AWS_REGION>' if you want to create # the bucket outside of us-east-1 region aws s3api put-object --bucket <BUCKET_NAME> \ --key <S3 OBJECT NAME> \ --body <PATH OF THE FILE TO UPLOAD>

The S3 bucket name should be globally unique. For more information about naming rules, see the AWS documentation.

Data engineer, App developer

Register the data asset to the provider’s connector by using Postman.

An EDC connector data asset holds the name of the data and its location. In this case, the EDC connector data asset will point to the created object in the S3 bucket:

  • Connector: Provider

  • Request:  Create Asset

  • Collection Variables: Update ASSET_NAME. Choose a meaningful name that represents the asset.

  • Request Body: Update the request body with the S3 bucket that you created for the provider.

    "dataAddress": { "edc:type": "AmazonS3", "name": "Vehicle Carbon Footprint", "bucketName": "<REPLACE WITH THE SOURCE BUCKET NAME>", "keyName": "<REPLACE WITH YOUR OBJECT NAME>", "region": "<REPLACE WITH THE BUCKET REGION>", "accessKeyId": "<REPLACE WITH YOUR ACCESS KEY ID>", "secretAccessKey": "<REPLACE WITH SECRET ACCESS KEY>" }
  • Response: A successful request returns the created time and the asset ID of the newly created asset.

    { "@id": "c89aa31c-ec4c-44ed-9e8c-1647f19d7583" }
  • Collection variable ASSET_ID: Update the Postman collection variable ASSET_ID with the ID that was generated automatically by the EDC connector after creation.

App developer, Data engineer

Define the usage policy of the asset.

An EDC data asset must be associated with clear usage policies. First, create the Policy Definition in the provider connector.

The policy of company X is to allow participants of the data space to use the carbon-emissions footprint data.

  • Request Body:

    • Connector: Provider

    • Request: Create Policy

    • Collection Variables: Update the Policy Name variable with the name of the policy.

  • Response: A successful request returns the created time and the policy ID of the newly created policy. Update the collection variable POLICY_ID with the ID of the policy generated by the EDC connector after creation.

App developer, Data engineer

Define an EDC Contract Offer for the asset and its usage policy.

To allow other participants to request access to your data, offer it in a contract that specifies the usage conditions and permissions:

  • Connector: Provider

  • Request: Create Contract Definition

  • Collection Variables: Update the Contract Name variable with a name for the contract offer or definition.

App developer, Data engineer
TaskDescriptionSkills required

Request the data catalog shared by company X.

As a data consumer in the data space, company Y first needs to discover the data that is being shared by other participants.

In this basic setup, you can do this by asking the consumer connector to request the catalog of available assets from the provider connector directly.

  • Connector: Consumer

  • Request: Request Catalog

  • Response: All available data assets from the provider together with their attached usage policies. As a data consumer, look for the contract of your interest and update the following collection variables accordingly.

    • CONTRACT_OFFER_ID ‒ The ID of the contract offer the consumer wants to negotiate

    • ASSET_ID ‒ The ID of the asset the consumer wants to negotiate

    • PROVIDER_CLIENT_ID ‒ The ID of the provider connector to negotiate with

App developer, Data engineer

Initiate a contract negotiation for the carbon-emissions intensity data from company X.

Now that you have identified the asset that you want to consume, initiate a contract negotiation process between the consumer and provider connectors.

  • Connector: Consumer

  • Request: Contract Negotiation

  • Collection Variables: Update the CONSUMER_CLIENT_ID variable with the ID of the consumer connector to negotiate with.

The process might take some time before reaching the VERIFIED state.

You can check the state of the Contract Negotiation and the corresponding Agreement ID by using the Get Negotiation request.

App developer, Data engineer
TaskDescriptionSkills required

Consume data from HTTP endpoints.

(Option 1) To use HTTP data plane to consume data in the data space, you can use webhook.site to emulate an HTTP server, and initiate the transfer process in the consumer connector:

  • Connector: Consumer

  • Request: Contract Negotiation

  • Collection Variables: Update the Contract Agreement ID variable with the ID of the contract agreement generated by the EDC connector.

  • Request Body: Update the request body to specify HTTP as a dataDestination alongside the webhook URL:

    { "dataDestination": { "type": "HttpProxy" }, "privateProperties": { "receiverHttpEndpoint": "<WEBHOOK URL>" } }

    The connector will send the information necessary to download the file directly to the webhook URL.

    The received payload is similar to the following:

    { "id": "dcc90391-3819-4b54-b401-1a005a029b78", "endpoint": "http://consumer-tractusx-connector-dataplane.consumer:8081/api/public", "authKey": "Authorization", "authCode": "<AUTH CODE YOU RECEIVE IN THE ENDPOINT>", "properties": { "https://w3id.org/edc/v0.0.1/ns/cid": "vehicle-carbon-footprint-contract:4563abf7-5dc7-4c28-bc3d-97f45e32edac:b073669b-db20-4c83-82df-46b583c4c062" } }

    Use the received credentials to get the S3 asset that was shared by the provider.

In this last step, you must send the request to the consumer data plane (forward ports properly), as stated in the payload (endpoint).

App developer, Data engineer

Consume data from S3 buckets directly.

(Option 2) Use Amazon S3 integration with the EDC connector, and directly point to the S3 bucket in the consumer infrastructure as a destination:

  • Request Body: Update the request body to specify the S3 bucket as a dataDestination.

    This should be the S3 bucket that you previously created for storing data received by the consumer.

    { "dataDestination": { "type": "AmazonS3", "bucketName": "{{ REPLACE WITH THE DESTINATION BUCKET NAME }}", "keyName": "{{ REPLACE WITH YOUR OBJECT NAME }}", "region": "{{ REPLACE WITH THE BUCKET REGION }}", "accessKeyId": "{{ REPLACE WITH YOUR ACCESS KEY ID }}", "secretAccessKey": "{{ REPLACE WITH SECRET ACCESS KEY }}" } } }
Data engineer, App developer

Troubleshooting

IssueSolution

The connector might raise an issue about the certificate PEM format.

Concatenate the contents of each file to a single line by adding \n.

Related resources

Additional information

Data space specifications

Participants

Participant

Description of the company

Focus of the company

Company X

Operates a fleet of vehicles across Europe and South America to transport various goods.

Aims to make data-driven decisions to reduce its carbon-emissions footprint intensity.

Company Y

An environmental regulatory authority

Enforces environmental regulations and policies designed to monitor and mitigate the environmental impact of businesses and industries, including carbon-emissions intensity.

Business case

Company X uses data space technology to share carbon footprint data with a compliance auditor, company Y, to evaluate and address the environmental impact of company X’s logistics operations.

Data space authority

The data space authority is a consortium of the organizations governing the data space. In this pattern, both company X and company Y form the governance body and represent a federated data space authority.

Data space components

Component

Chosen implementation

Additional information

Dataset exchange protocol

Dataspace Protocol version 0.8

Data space connector

Tractus-X EDC Connector version 0.4.1

Data exchange policies

Default USE Policy

Data space services

Service

Implementation

Additional information

Identity service

Dynamic Attribute Provisioning System (DAPS)

"A Dynamic Attribute Provisioning System (DAPS) has the intent to ascertain certain attributes to organizations and connectors. Hence, third parties do not need to trust the latter provided they trust the DAPS assertions." — DAPS

To focus on the connector’s logic, the data space is deployed on an Amazon EC2 machine using Docker Compose.

Discovery service

Gaia-X Federated Catalogue

"The Federated Catalogue constitutes an indexed repository of Gaia-X Self-Descriptions to enable the discovery and selection of Providers and their service offerings. The Self-Descriptions are the information given by Participants about themselves and about their services in the form of properties and claims." — Gaia-X Ecosystem Kickstarter

Data to be exchanged

Data assets

Description

Format

Carbon emissions data

Intensity values for different vehicle types in the specified region (Europe and South America) from the entire fleet of vehicles

JSON file

Data model

{ "region": "string", "vehicles": [ // Each vehicle type has its Gross Vehicle Weight (GVW) category and its emission intensity in grams of CO2 per Tonne-Kilometer (g CO2 e/t-km) according to the "Well-to-Wheel" (WTW) measurement. { "type": "string", "gross_vehicle_weight": "string", "emission_intensity": { "CO2": "number", "unit": "string" } } ] }

Tractus-X EDC connector

For documentation of each Tractus-X EDC parameter, see the original values file.

The following table lists all of services, along with their corresponding exposed ports and endpoints for reference.

Service name

Port and path

Control plane

●        management: ‒ Port: 8081 Path: /management

●        control ‒ Port: 8083 Path: /control

●        protocol   Port: 8084 Path: /api/v1/dsp

●        metrics ‒ Port: 9090 Path: /metrics

●        observability ‒ Port: 8085 Path: /observability

Data plane

default ‒ Port: 8080 Path: /api

public ‒ Port: 8081 Path: /api/dataplane/control

proxy ‒ Port: 8186 Path: /proxy

metrics ‒ Port: 9090 Path: /metrics

observability ‒ Port: 8085 Path: /observability

Vault

Port: 8200

PostgreSQL

Port: 5432

Using AWS Secrets Manager Manager

It's possible to use Secrets Manager instead of HashiCorp Vault as the secrets manager. To do so you, must use or build the AWS Secrets Manager EDC extension.

You will be responsible for creating and maintaining your own image, because Tractus-X doesn't provide support for Secrets Manager.

To accomplish that, you need to modify the build Gradle files of both the control plane and the data plane of the connector by introducing your AWS Secrets Manager EDC extension (see this maven artifact for an example), then build, maintain, and reference the Docker image.

For more insights on refactoring the Tractus-X connector Docker image, see Refactor Tractus-X EDC Helm charts.

For simplicity purposes, we avoid to rebuilding the connector image in this pattern and use HashiCorp Vault.