Convert and unpack EBCDIC data to ASCII on AWS by using Python - AWS Prescriptive Guidance

Convert and unpack EBCDIC data to ASCII on AWS by using Python

Created by Luis Gustavo Dantas (AWS)

Code repository: Mainframe Data Utilities

Environment: PoC or pilot

Source: Mainframe EBCDIC data

Target: Distributed or cloud modernized ASCII data

R Type: Replatform

Workload: IBM

Technologies: Mainframe; Databases; Storage & backup; Modernization

AWS services: Amazon EBS; Amazon EC2

Summary

Because mainframes typically host critical business data, modernizing data is one of the most important tasks when migrating data to the Amazon Web Services (AWS) Cloud or other American Standard Code for Information Interchange (ASCII) environment. On mainframes, data is typically encoded in extended binary-coded decimal interchange code (EBCDIC) format. Exporting database, Virtual Storage Access Method (VSAM), or flat files generally produces packed, binary EBCDIC files, which are more complex to migrate. The most commonly used database migration solution is change data capture (CDC), which, in most cases, automatically converts data encoding. However, CDC mechanisms might not be available for these database, VSAM, or flat files. For these files, an alternative approach is required to modernize the data.

This pattern describes how to modernize EBCDIC data by converting it to ASCII format. After conversion, you can load the data into distributed databases or have applications in the cloud process the data directly. The pattern uses the conversion script and sample files in the mainframe-data-utilities GitHub repository.

Prerequisites and limitations

Prerequisites

Limitations

  • File layouts defined inside COBOL programs are not supported. They must be made available separately.

Product versions

  • Python version 3.8 or later

Architecture

Source technology stack

  • EBCDIC data on a mainframe

  • COBOL copybook

Target technology stack

  • Amazon Elastic Compute Cloud (Amazon EC2) instance in a virtual private cloud (VPC)

  • Amazon Elastic Block Store (Amazon EBS)

  • Python and its required packages, JavaScript Object Notation (JSON), sys, and datetime

  • ASCII flat file ready to be read by a modern application or loaded in a relational database table

Target architecture

EBCDIC data converted to ASCII on an EC2 instance by using Python scripts and a COBOL copybook

The architecture diagram shows the process of converting an EBCDIC file to an ASCII file on an EC2 instance:

  1. Using the parse_copybook_to_json.py script, you convert the COBOL copybook to a JSON file.

  2. Using the JSON file and the extract_ebcdic_to_ascii.py script, you convert the EBCDIC data to an ASCII file.

Automation and scale

After the resources needed for the first manual file conversions are in place, you can automate file conversion. This pattern doesn’t include instructions for automation. There are multiple ways to automate the conversion. The following is an overview of one possible approach:

  1. Encapsulate the AWS Command Line Interface (AWS CLI) and Python script commands into a shell script.

  2. Create an AWS Lambda function that asynchronously submits the shell script job into an EC2 instance. For more information, see Scheduling SSH jobs using AWS Lambda.

  3. Create an Amazon Simple Storage Service (Amazon S3) trigger that invokes the Lambda function every time a legacy file is uploaded. For more information, see Using an Amazon S3 trigger to invoke a Lambda function.

Tools

AWS services

Other tools

  • GitHub is a code-hosting service that provides collaboration tools and version control.

  • Python is a high-level programming language.

Code repository

The code for this pattern is available in the mainframe-data-utilities GitHub repository.

Epics

TaskDescriptionSkills required

Launch an EC2 instance.

The EC2 instance must have outbound internet access. This allows the instance to access the Python source code available on GitHub. To create the instance:

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

  2. Launch an EC2 Linux instance. Use a public IP address and allow inbound access through port 22. Ensure that the storage size of the instance is at least twice the size of the EBCDIC data file. For instructions, see the Amazon EC2 documentation.

General AWS

Install Git.

  1. Using a secure shell (SSH) client, connect to the EC2 instance you just launched. For more information, see Connect to your Linux instance.

  2. In the Amazon EC2 console, run the following command. This installs Git on the EC2 instance.

    sudo yum install git
  3. Run the following command and confirm that Git has been successfully installed.

    git --version
General AWS, Linux

Install Python.

  1. In the Amazon EC2 console, run the following command. This installs Python on the EC2 instance.

    sudo yum install python3
  2. In the Amazon EC2 console, run the following command. This installs Pip3 on the EC2 instance.

    sudo yum install python3-pip
  3. In the Amazon EC2 console, run the following command. This installs AWS SDK for Python (Boto3) on the EC2 instance.

    sudo pip3 install boto3
  4. In the Amazon EC2 console, run the following command, where <us-east-1> is the code for your AWS Region. For a complete list of Region codes, see Available Regions in the Amazon EC2 documentation.

    export AWS_DEFAULT_REGION=<us-east-1>
General AWS, Linux

Clone the GitHub repository.

  1. In the Amazon EC2 console, run the following command. This clones the mainframe-data-utilities repository from GitHub and opens the default copy location, the home folder.

    git clone https://github.com/aws-samples/mainframe-data-utilities.git
  2. In the home folder, confirm that the mainframe-data-utilities folder is present.

General AWS, GitHub
TaskDescriptionSkills required

Parse the COBOL copybook into the JSON layout file.

Inside the mainframe-data-utilities folder, run the parse_copybook_to_json.py script. This automation module reads the file layout from a COBOL copybook and creates a JSON file. The JSON file contains the information required to interpret and extract the data from the source file. This creates the JSON metadata from the COBOL copybook.

The following command converts the COBOL copybook to a JSON file.

python3 parse_copybook_to_json.py \ -copybook LegacyReference/COBPACK2.cpy \ -output sample-data/cobpack2-list.json \ -dict sample-data/cobpack2-dict.json \ -ebcdic sample-data/COBPACK.OUTFILE.txt \ -ascii sample-data/COBPACK.ASCII.txt \ -print 10000

The script prints the received arguments.

----------------------------------------------------------------------- Copybook file...............| LegacyReference/COBPACK2.cpy Parsed copybook (JSON List).| sample-data/cobpack2-list.json JSON Dict (documentation)...| sample-data/cobpack2-dict.json ASCII file..................| sample-data/COBPACK.ASCII.txt EBCDIC file.................| sample-data/COBPACK.OUTFILE.txt Print each..................| 10000 -----------------------------------------------------------------------

For more information about the arguments, see the README file in the GitHub repository.

General AWS, Linux

Inspect the JSON layout file.

  1. Navigate to the output path defined in the parse_copybook_to_json.py script.

  2. Check the creation time of the sample-data/cobpack2-list.json file to confirm that you have selected the appropriate JSON layout file.

  3. Examine the JSON file and confirm that the contents are similar to the following.

"input": "extract-ebcdic-to-ascii/COBPACK.OUTFILE.txt", "output": "extract-ebcdic-to-ascii/COBPACK.ASCII.txt", "max": 0, "skip": 0, "print": 10000, "lrecl": 150, "rem-low-values": true, "separator": "|", "transf": [ { "type": "ch", "bytes": 19, "name": "OUTFILE-TEXT" }

The most important attributes of the JSON layout file are:

  • input – Contains the path of the EBCDIC file to be converted

  • output – Defines the path where the ASCII file will be generated

  • lrecl – Specifies the size in bytes of the logical record length

  • transf – Lists all fields and their size in bytes

For more information about the JSON layout file, see the README file in the GitHub repository.

General AWS, JSON

Create the ASCII file.

Run the extract_ebcdic_to_ascii.py script, which is included in the cloned GitHub repository. This script reads the EBCDIC file and writes a converted and readable ASCII file.

python3 extract_ebcdic_to_ascii.py -local-json sample-data/cobpack2-list.json

As the script processes the EBCDIC data, it prints a message for every batch of 10,000 records. See the following example.

------------------------------------------------------------------ 2023-05-15 21:21:46.322253 | Local Json file | -local-json | sample-data/cobpack2-list.json 2023-05-15 21:21:47.034556 | Records processed | 10000 2023-05-15 21:21:47.736434 | Records processed | 20000 2023-05-15 21:21:48.441696 | Records processed | 30000 2023-05-15 21:21:49.173781 | Records processed | 40000 2023-05-15 21:21:49.874779 | Records processed | 50000 2023-05-15 21:21:50.705873 | Records processed | 60000 2023-05-15 21:21:51.609335 | Records processed | 70000 2023-05-15 21:21:52.292989 | Records processed | 80000 2023-05-15 21:21:52.938366 | Records processed | 89280 2023-05-15 21:21:52.938448 Seconds 6.616232

For information about how to change the print frequency, see the README file in the GitHub repository.

General AWS

Examine the ASCII file.

  1. Check the creation time of the extract-ebcdic-to-ascii/COBPACK.ASCII.txt file to verify that it was recently created.

  2. In the Amazon EC2 console, enter the following command. This opens the first record of the ASCII file.

    head sample-data/COBPACK.ASCII.txt -n 1| xxd
  3. Examine the contents of the first record. Because EBCDIC files are usually binary, they don't have carriage return and line feed (CRLF) special characters. The extract_ebcdic_to_ascii.py script adds a pipe character as a column separator, which is defined in the script parameters.

If you used the sample EBCDIC file provided, the following is the first record in the ASCII file.

00000000: 2d30 3030 3030 3030 3030 3130 3030 3030 -000000000100000 00000010: 3030 307c 3030 3030 3030 3030 3031 3030 000|000000000100 00000020: 3030 3030 3030 7c2d 3030 3030 3030 3030 000000|-00000000 00000030: 3031 3030 3030 3030 3030 7c30 7c30 7c31 0100000000|0|0|1 00000040: 3030 3030 3030 3030 7c2d 3130 3030 3030 00000000|-100000 00000050: 3030 307c 3130 3030 3030 3030 307c 2d31 000|100000000|-1 00000060: 3030 3030 3030 3030 7c30 3030 3030 7c30 00000000|00000|0 00000070: 3030 3030 7c31 3030 3030 3030 3030 7c2d 0000|100000000|- 00000080: 3130 3030 3030 3030 307c 3030 3030 3030 100000000|000000 00000090: 3030 3030 3130 3030 3030 3030 307c 2d30 0000100000000|-0 000000a0: 3030 3030 3030 3030 3031 3030 3030 3030 0000000001000000 000000b0: 3030 7c41 7c41 7c0a 00|A|A|.
General AWS, Linux

Evaluate the EBCDIC file.

In the Amazon EC2 console, enter the following command. This opens the first record of the EBCDIC file.

head sample-data/COBPACK.OUTFILE.txt -c 150 | xxd

If you used the sample EBCDIC file, the following is the result.

00000000: 60f0 f0f0 f0f0 f0f0 f0f0 f1f0 f0f0 f0f0 `............... 00000010: f0f0 f0f0 f0f0 f0f0 f0f0 f0f0 f1f0 f0f0 ................ 00000020: f0f0 f0f0 f0f0 f0f0 f0f0 f0f0 f0f0 f1f0 ................ 00000030: f0f0 f0f0 f0f0 d000 0000 0005 f5e1 00fa ................ 00000040: 0a1f 0000 0000 0005 f5e1 00ff ffff fffa ................ 00000050: 0a1f 0000 000f 0000 0c10 0000 000f 1000 ................ 00000060: 0000 0d00 0000 0000 1000 0000 0f00 0000 ................ 00000070: 0000 1000 0000 0dc1 c100 0000 0000 0000 ................ 00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000090: 0000 0000 0000 ......

To evaluate the equivalence between the source and target files, comprehensive knowledge of EBCDIC is required. For example, the first character of the sample EBCDIC file is a hyphen (-). In hexadecimal notation of the EBCDIC file, this character is represented by 60, and in hexadecimal notation of the ASCII file, this character is represented by 2D. For an EBCDIC-to-ASCII conversion table, see EBCDIC to ASCII on the IBM website.

General AWS, Linux, EBCDIC

Related resources

References

Tutorials