Convert mainframe files from EBCDIC format to character-delimited ASCII format in Amazon S3 using AWS Lambda
Created by Luis Gustavo Dantas (AWS)
Code repository: Mainframe Data Utilities | Environment: PoC or pilot | Source: IBM EBCDIC files |
Target: Delimited ASCII files | R Type: Replatform | Workload: IBM |
Technologies: Mainframe | AWS services: AWS CloudShell; AWS Lambda; Amazon S3; Amazon CloudWatch |
Summary
This pattern shows you how to launch an AWS Lambda function that automatically converts mainframe EBCDIC (Extended Binary Coded Decimal Interchange Code) files to character-delimited ASCII (American Standard Code for Information Interchange) files. The Lambda function runs after the ASCII files are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. After the file conversion, you can read the ASCII files on x86-based workloads or load the files into modern databases.
The file conversion approach demonstrated in this pattern can help you overcome the challenges of working with EBCDIC files on modern environments. Files encoded in EBCDIC often contain data represented in a binary or packed decimal format, and fields are fixed-length. These characteristics create obstacles because modern x86-based workloads or distributed environments generally work with ASCII-encoded data and can’t process EBCDIC files.
Prerequisites and limitations
Prerequisites
An active AWS account
An S3 bucket
An AWS Identity and Access Management (IAM) user with administrative permissions
AWS CloudShell
Python 3.8.0
or later A flat file encoded in EBCDIC and its corresponding data structure in a common business-oriented language (COBOL) copybook
Note: This pattern uses a sample EBCDIC file (CLIENT.EBCDIC.txt
Limitations
COBOL copybooks usually hold multiple layout definitions. The mainframe-data-utilities
project can parse this kind of copybook but can't infer which layout to consider on data conversion. This is because copybooks don't hold this logic (which remains on COBOL programs instead). Consequently, you must manually configure the rules for selecting layouts after you parse the copybook. This pattern is subject to Lambda quotas.
Architecture
Source technology stack
IBM z/OS, IBM i, and other EBCDIC systems
Sequential files with data encoded in EBCDIC (such as IBM Db2 unloads)
COBOL copybook
Target technology stack
Amazon S3
Amazon S3 event notification
IAM
Lambda function
Python 3.8 or later
Mainframe Data Utilities
JSON metadata
Character-delimited ASCII files
Target architecture
The following diagram shows an architecture for converting mainframe EBCDIC files to ASCII files.
The diagram shows the following workflow:
The user runs the copybook parser script to convert the COBOL copybook into a JSON file.
The user uploads the JSON metadata to an S3 bucket. This makes the metadata readable by the data conversion Lambda function.
The user or an automated process uploads the EBCDIC file to the S3 bucket.
The S3 notification event triggers the data conversion Lambda function.
AWS verifies the S3 bucket read-write permissions for the Lambda function.
Lambda reads the file from the S3 bucket and locally converts the file from EBCDIC to ASCII.
Lambda logs the process status in Amazon CloudWatch.
Lambda writes the ASCII file back to Amazon S3.
Note: The copybook parser script runs only once, after it converts the metadata to JSON and then uploads that data to an S3 bucket. After the initial conversion, any EBCDIC file that uses the same JSON file that's uploaded to the S3 bucket will use the same metadata.
Tools
AWS tools
Amazon CloudWatch helps you monitor the metrics of your AWS resources and the applications that you run on AWS in real time.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
AWS CloudShell is a browser-based shell that you can use to manage AWS services by using the AWS Command Line Interface (AWS CLI) and a range of preinstalled development tools.
AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. Lambda runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
Other tools
Code
The code for this pattern is available in the GitHub mainframe-data-utilities
Best practices
Consider the following best practices:
Set the required permissions at the Amazon Resource Name (ARN) level.
Always grant least-privilege permissions for IAM policies. For more information, see Security best practices in IAM in the IAM documentation.
Epics
Task | Description | Skills required |
---|---|---|
Create the environment variables. | Copy the following environment variables to a text editor, and then replace the <placeholder> values in the following example with your resource values:
Note: You will create references to your S3 bucket, AWS account, and AWS Region later. To define environment variables, open the CloudShell console Note: You must repeat this step every time the CloudShell session restarts. | General AWS |
Create a working folder. | To simplify the resource clean-up process later on, create a working folder in CloudShell by running the following command:
Note: You must change the directory to the working directory ( | General AWS |
Task | Description | Skills required |
---|---|---|
Create a trust policy for the Lambda function. | The EBCDIC converter runs in a Lambda function. The function must have an IAM role. Before you create the IAM role, you must define a trust policy document that enables resources to assume that policy. From the CloudShell working folder, create a policy document by running the following command:
| General AWS |
Create the IAM role for Lambda conversion. | To create an IAM role, run the following AWS CLI command from the CloudShell working folder:
| General AWS |
Create the IAM policy document for the Lambda function. | The Lambda function must have read-write access to the S3 bucket and write permissions for Amazon CloudWatch Logs. To create an IAM policy, run the following command from the CloudShell working folder:
| General AWS |
Attach the IAM policy document to the IAM role. | To attach the IAM policy to the IAM role, run the following command from your CloudShell working folder:
| General AWS |
Task | Description | Skills required |
---|---|---|
Download the EBCDIC conversion source code. | From the CloudShell working folder, run the following command to download the mainframe-data-utilities source code from GitHub:
| General AWS |
Create the ZIP package. | From the CloudShell working folder, run the following command to create the ZIP package that creates the Lambda function for EBCDIC conversion:
| General AWS |
Create the Lambda function. | From the CloudShell working folder, run the following command to create the Lambda function for EBCDIC conversion:
Note: The environment variable layout tells the Lambda function where the JSON metadata resides. | General AWS |
Create the resource-based policy for the Lambda function. | From the CloudShell working folder, run the following command to allow your Amazon S3 event notification to trigger the Lambda function for EBCDIC conversion:
| General AWS |
Task | Description | Skills required |
---|---|---|
Create the configuration document for the Amazon S3 event notification. | The Amazon S3 event notification initiates the EBCDIC conversion Lambda function when files are placed in the input folder. From the CloudShell working folder, run the following command to create the JSON document for the Amazon S3 event notification:
| General AWS |
Create the Amazon S3 event notification. | From the CloudShell working folder, run the following command to create the Amazon S3 event notification:
| General AWS |
Task | Description | Skills required |
---|---|---|
Parse the COBOL copybook. | From the CloudShell working folder, run the following command to parse a sample COBOL copybook into a JSON file (which defines how to read and slice the data file properly):
| General AWS |
Add the transformation rule. | The sample data file and its corresponding COBOL copybook is a multi-layout file. This means that the conversion must slice data based on certain rules. In this case, bytes on position 3 and 4 in each row define the layout. From the CloudShell working folder, edit the
| General AWS, IBM Mainframe, Cobol |
Upload the JSON metadata to the S3 bucket. | From the CloudShell working folder, run the following AWS CLI command to upload the JSON metadata to your S3 bucket:
| General AWS |
Task | Description | Skills required |
---|---|---|
Send the EBCDIC file to the S3 bucket. | From the CloudShell working folder, run the following command to send the EBCDIC file to the S3 bucket:
Note: We recommend that you set different folders for input (EBCDIC) and output (ASCII) files to avoid calling the Lambda conversion function again when the ASCII file is uploaded to the S3 bucket. | General AWS |
Check the output. | From the CloudShell working folder, run the following command to check if the ASCII file is generated in your S3 bucket:
Note: The data conversion can take several seconds to happen. We recommend that you check for the ASCII file a few times. After the ASCII file is available, run the following command to download the file from the S3 bucket to the current folder:
Check the ASCII file content:
| General AWS |
Task | Description | Skills required |
---|---|---|
(Optional) Prepare the variables and folder. | If you lose connection with CloudShell, reconnect and then run the following command to change the directory to the working folder:
Ensure that the environment variables are defined:
| General AWS |
Remove the notification configuration for the bucket. | From the CloudShell working folder, run the following command to remove the Amazon S3 event notification configuration:
| General AWS |
Delete the Lambda function. | From the CloudShell working folder, run the following command to delete the Lambda function for the EBCDIC converter:
| General AWS |
Delete the IAM role and policy. | From the CloudShell working folder, run the following command to remove the EBCDIC converter role and policy:
| General AWS |
Delete the files generated in the S3 bucket. | From the CloudShell working folder, run the following command to delete the files generated in the S3 bucket:
| General AWS |
Delete the working folder. | From the CloudShell working folder, run the following command to remove
| General AWS |
Related resources
Mainframe Data Utilities README
(GitHub) The EBCDIC character set
(IBM documentation) EBCDIC to ASCII
(IBM documentation) COBOL
(IBM documentation) Using an Amazon S3 trigger to invoke a Lambda function (AWS Lambda documentation)