Convert and unpack EBCDIC data to ASCII on AWS by using Python
Created by Luis Gustavo Dantas (AWS)
Code repository: Mainframe Data Utilities | Environment: PoC or pilot | Source: Mainframe EBCDIC data |
Target: Distributed or cloud modernized ASCII data | R Type: Replatform | Workload: IBM |
Technologies: Mainframe; Databases; Storage & backup; Modernization | AWS services: Amazon EBS; Amazon EC2 |
Summary
Because mainframes typically host critical business data, modernizing data is one of the most important tasks when migrating data to the Amazon Web Services (AWS) Cloud or other American Standard Code for Information Interchange (ASCII) environment. On mainframes, data is typically encoded in extended binary-coded decimal interchange code (EBCDIC) format. Exporting database, Virtual Storage Access Method (VSAM), or flat files generally produces packed, binary EBCDIC files, which are more complex to migrate. The most commonly used database migration solution is change data capture (CDC), which, in most cases, automatically converts data encoding. However, CDC mechanisms might not be available for these database, VSAM, or flat files. For these files, an alternative approach is required to modernize the data.
This pattern describes how to modernize EBCDIC data by converting it to ASCII format. After conversion, you can load the data into distributed databases or have applications in the cloud process the data directly. The pattern uses the conversion script and sample files in the mainframe-data-utilities
Prerequisites and limitations
Prerequisites
An active AWS account.
An EBCDIC input file and its corresponding common business-oriented language (COBOL) copybook. A sample EBCDIC file and COBOL copybook are included in the mainframe-data-utilities
GitHub repository. For more information about COBOL copybooks, see Enterprise COBOL for z/OS 6.4 Programming Guide on the IBM website.
Limitations
File layouts defined inside COBOL programs are not supported. They must be made available separately.
Product versions
Python version 3.8 or later
Architecture
Source technology stack
EBCDIC data on a mainframe
COBOL copybook
Target technology stack
Amazon Elastic Compute Cloud (Amazon EC2) instance in a virtual private cloud (VPC)
Amazon Elastic Block Store (Amazon EBS)
Python and its required packages, JavaScript Object Notation (JSON), sys, and datetime
ASCII flat file ready to be read by a modern application or loaded in a relational database table
Target architecture
The architecture diagram shows the process of converting an EBCDIC file to an ASCII file on an EC2 instance:
Using the parse_copybook_to_json.py script, you convert the COBOL copybook to a JSON file.
Using the JSON file and the extract_ebcdic_to_ascii.py script, you convert the EBCDIC data to an ASCII file.
Automation and scale
After the resources needed for the first manual file conversions are in place, you can automate file conversion. This pattern doesn’t include instructions for automation. There are multiple ways to automate the conversion. The following is an overview of one possible approach:
Encapsulate the AWS Command Line Interface (AWS CLI) and Python script commands into a shell script.
Create an AWS Lambda function that asynchronously submits the shell script job into an EC2 instance. For more information, see Scheduling SSH jobs using AWS Lambda
. Create an Amazon Simple Storage Service (Amazon S3) trigger that invokes the Lambda function every time a legacy file is uploaded. For more information, see Using an Amazon S3 trigger to invoke a Lambda function.
Tools
AWS services
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need, and quickly scale them up or down.
Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for use with Amazon Elastic Compute Cloud (Amazon EC2) instances.
AWS Command Line Interface (AWS CLI) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.
AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
Other tools
Code repository
The code for this pattern is available in the mainframe-data-utilities
Epics
Task | Description | Skills required |
---|---|---|
Launch an EC2 instance. | The EC2 instance must have outbound internet access. This allows the instance to access the Python source code available on GitHub. To create the instance:
| General AWS |
Install Git. |
| General AWS, Linux |
Install Python. |
| General AWS, Linux |
Clone the GitHub repository. |
| General AWS, GitHub |
Task | Description | Skills required |
---|---|---|
Parse the COBOL copybook into the JSON layout file. | Inside the The following command converts the COBOL copybook to a JSON file.
The script prints the received arguments.
For more information about the arguments, see the README file | General AWS, Linux |
Inspect the JSON layout file. |
The most important attributes of the JSON layout file are:
For more information about the JSON layout file, see the README file | General AWS, JSON |
Create the ASCII file. | Run the extract_ebcdic_to_ascii.py script, which is included in the cloned GitHub repository. This script reads the EBCDIC file and writes a converted and readable ASCII file.
As the script processes the EBCDIC data, it prints a message for every batch of 10,000 records. See the following example.
For information about how to change the print frequency, see the README file | General AWS |
Examine the ASCII file. |
If you used the sample EBCDIC file provided, the following is the first record in the ASCII file.
| General AWS, Linux |
Evaluate the EBCDIC file. | In the Amazon EC2 console, enter the following command. This opens the first record of the EBCDIC file.
If you used the sample EBCDIC file, the following is the result.
To evaluate the equivalence between the source and target files, comprehensive knowledge of EBCDIC is required. For example, the first character of the sample EBCDIC file is a hyphen ( | General AWS, Linux, EBCDIC |
Related resources
References
The EBCDIC character set
(IBM documentation) EBCDIC to ASCII
(IBM documentation) COBOL
(IBM documentation) Basic JCL concepts
(IBM documentation) Connect to your Linux instance (Amazon EC2 documentation)
Tutorials
Scheduling SSH jobs using AWS Lambda
(AWS blog post) Using an Amazon S3 trigger to invoke a Lambda function (AWS Lambda documentation)