Transfer large-scale Db2 z/OS data to Amazon S3 in CSV files
Created by Bruno Sahinoglu (AWS), Ivan Schuster (AWS), and Abhijit Kshirsagar (AWS)
Code repository: Unload DB2 z/OS to S3 | Environment: Production | Source: Db2 |
Target: Amazon S3 | R Type: Replatform | Workload: IBM |
Technologies: Mainframe; Data lakes; Databases; DevelopmentAndTesting; Migration | AWS services: Amazon Aurora; AWS Glue; Amazon S3; AWS Transfer Family; Amazon Athena |
Summary
A mainframe is still a system of record in many enterprises, containing a massive amount of data including master data entities with records of current as well as the historical business transactions. It is often siloed and not easily accessed by the distributed systems within the same enterprise. With the emergence of cloud technology and big data democratization, enterprises are interested in using the insights hidden in the mainframe data to develop new business capabilities.
With that objective, enterprises are looking to open their mainframe Db2 data to their Amazon Web Services (AWS) Cloud environment. The business reasons are several and the transfer methods differ from case to case. You might prefer to connect your application directly to the mainframe, or you might prefer to replicate your data in near real time. If the use case is to feed a data warehouse or a data lake, having an up-to-date copy is no longer a concern, and the procedure described in this pattern might suffice, especially if you want to avoid any third-party product licensing costs. Another use case might be the mainframe data transfer for a migration project. In a migration scenario, data is required for performing the functional equivalence testing. The approach described in this post is a cost effective way to transfer the Db2 data to the AWS Cloud environment.
Because Amazon Simple Storage Service (Amazon S3) is one of the most integrated AWS services, you can access the data from there and gather insights directly by using other AWS services such as Amazon Athena, AWS Lambda functions, or Amazon QuickSight . You can also load the data to Amazon Aurora or Amazon DynamoDB by using AWS Glue or AWS Database Migration Service (AWS DMS). With that aim in mind, this describes how to unload Db2 data in CSV files in ASCII format on the mainframe and transfer the files to Amazon S3.
For this purpose, mainframe scripts
Prerequisites and limitations
Prerequisites
An IBM z/OS operating system user with authorization to run Restructured Extended Executor (REXX) and JCL scripts.
Access to z/OS Unix System Services (USS) to generate SSH (Secure Shell) private and public keys.
A writable S3 bucket. For more information, see Create your first S3 bucket in the Amazon S3 documentation.
An AWS Transfer Family SSH File Transfer Protocol (SFTP)-enabled server using Service managed as the identity provider and Amazon S3 as the AWS storage service. For more information, see Create an SFTP-enabled server in the AWS Transfer Family documentation.
Limitations
This approach isn’t suitable for near real-time or real-time data synchronization.
Data can be moved only from Db2 z/OS to Amazon S3, not the other way around.
Architecture
Source technology stack
Mainframe running Db2 on z/OS
Target technology stack
AWS Transfer Family
Amazon S3
Amazon Athena
Amazon QuickSight
AWS Glue
Amazon Relational Database Service (Amazon RDS)
Amazon Aurora
Amazon Redshift
Source and target architecture
The following diagram shows the process for generating, extracting, and transferring Db2 z/OS data in ASCII CSV format to an S3 bucket.
A list of tables is selected for data migration from the Db2 catalog.
The list is used to drive the generation of unload jobs with the numeric and data columns in the external format.
The data is then transferred over to Amazon S3 by using AWS Transfer Family.
An AWS Glue extract, transform, and load (ETL) job can transform the data and load it to a processed bucket in the specified format, or AWS Glue can feed the data directly into the database.
Amazon Athena and Amazon QuickSight can be used to query and render the data to drive analytics.
The following diagram shows a logical flow of the entire process.
The first JCL, called TABNAME, will use the Db2 utility DSNTIAUL to extract and generate the list of tables that you plan to unload from Db2. To choose your tables, you must manually adapt the SQL input to select and add filter criteria to include one or more Db2 schemas.
The second JCL, called REXXEXEC, will use the a JCL skeleton and the REXX program that is provided to process the Table list created by the JCL TABNAME and generate one JCL per table name. Each JCL will contain one step for unloading the table and another step for sending the file to the S3 bucket by using the SFTP protocol.
The last step consists of running the JCL to unload the table and transferring the file to AWS. The entire process can be automated using a scheduler on premises or on AWS.
Tools
AWS services
Amazon Athena is an interactive query service that helps you analyze data directly in Amazon Simple Storage Service (Amazon S3) by using standard SQL.
Amazon Aurora is a fully managed relational database engine that's built for the cloud and compatible with MySQL and PostgreSQL.
AWS Glue is a fully managed extract, transform, and load (ETL) service. It helps you reliably categorize, clean, enrich, and move data between data stores and data streams.
Amazon QuickSight is a cloud-scale business intelligence (BI) service that helps you visualize, analyze, and report your data in a single dashboard.
Amazon Redshift is a managed petabyte-scale data warehouse service in the AWS Cloud.
Amazon Relational Database Service (Amazon RDS) helps you set up, operate, and scale a relational database in the AWS Cloud.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
AWS Transfer Family is a secure transfer service that enables you to transfer files into and out of AWS storage services.
Mainframe tools
SSH File Transfer Protocol (SFTP)
is a secure file transfer protocol that allows remote login to and file transfer between servers. SSH provides security by encrypting all traffic. DSNTIAUL
is a sample program provided by IBM for unloading data. DSNUTILB
is a utilities batch program provided by IBM for unloading data with different options from DSNTIAUL. z/OS OpenSSH
is a port of Open Source Software SSH running on the Unix System Service under the IBM operating system z/OS. SSH is a secure, encrypted connection program between two computers running on a TCP/IP network. It provides multiple utilities, including ssh-keygen. REXX (Restructured Extended Executor)
script is used to automate JCL generation with the Db2 Unload and SFTP steps.
Code
The code for this pattern is available in the GitHub unloaddb2
Best practices
For the first unload, the generated JCLs should unload the entire table data.
After the first full unload, perform incremental unloads for better performance and cost savings. pdate the SQL query in the template JCL deck to accommodate any changes to the unload process.
You can convert the schema manually or by using a script on Lambda with the Db2 SYSPUNCH as an input. For an industrial process, AWS Schema Conversion Tool (SCT) is the preferred option.
Finally, use a mainframe-based scheduler or a scheduler on AWS with an agent on the mainframe to help manage and automate the entire process.
Epics
Task | Description | Skills required |
---|---|---|
Create the S3 bucket. | For instructions, see Create your first S3 bucket. | General AWS |
Task | Description | Skills required |
---|---|---|
Create an SFTP-enabled server. | To open and create an SFTP server on the AWS Transfer Family console
| General AWS |
Create an IAM role for Transfer Family. | To create an AWS Identity and Access Management (IAM) role for Transfer Family to access Amazon S3, follow the instructions in Create an IAM role and policy. | AWS administrator |
Add an Amazon S3 service-managed user. | To add the Amazon S3 service-managed user, follow the instructions in the AWS documentation, and use your mainframe user ID. | General AWS |
Task | Description | Skills required |
---|---|---|
Create the SSH key. | Under your mainframe USS environment, run the following command.
Note: When prompted for a passphrase, keep it empty. | Mainframe developer |
Give the right authorization levels to the SSH folder and key files. | By default, the public and private keys will be stored in the user directory You must give the authorization 644 to the key files and 700 to the folder.
| Mainframe developer |
Copy the public key content to your Amazon S3 service-managed user. | To copy the USS-generated public key content, open the AWS Transfer Family console
| Mainframe developer |
Task | Description | Skills required |
---|---|---|
Generate the in-scope Db2 table list. | Provide input SQL to create a list of the tables that are scoped for data migration. This step requires you to specify selection criteria quering the Db2 catalog table SYSIBM.SYSTABLES using a SQL where clause. Filters can be customized to include a specific schema or table names that start with a particular prefix or based on a timestamp for incremental unload. Output is captured in a physical sequential (PS) dataset on the mainframe. This dataset will act as input for the next phase of JCL generation. Before you use the JCL TABNAME (You can rename it if necessary), make the following changes:
Db2 table list extraction job
| Mainframe developer |
Modify the JCL templates. | The JCL templates that are provided with this pattern contain a generic job card and library names. However, most mainframe sites will have their own naming standards for dataset names, library names, and job cards. For example, a specific job class might be required to run Db2 jobs. The Job Entry Subsytem implementations JES2 and JES3 can impose additional changes. Standard load libraries might have a different first qualifier than Make the following changes in the skeleton JCL UNLDSKEL:
Unload and SFTP JCL skeleton
| Mainframe developer |
Generate the Mass Unload JCL. | This step involves running a REXX script under an ISPF environment by using JCL. Provide the list of in-scope tables created on the first step as input for mass JCL generation against the Make the following changes in the JCL REXXEXEC (you can change the name):
Mass JCL generation job
Before you use the REXX script, make the following changes:
ZSTEPS REXX script
| Mainframe developer |
Task | Description | Skills required |
---|---|---|
Perform the Db2 Unload step. | After the JCL generation, you will have as many JCLs as you have tables that need to be unloaded. This story uses a JCL-generated example to explain the structure and the most important steps. No action is required on your part. The following information is for reference only. If your intention is to submit the JCLs that you have generated in the previous step, skip to the Submit the LODnnnnn JCLs task. When unloading Db2 data using a JCL with the IBM provided DSNUTILB Db2 utility, you must make sure that the unloaded data does not contain compressed numeric data. To accomplish this, use the DSNUTILB The The following example shows what the unload step in the generated JCL looks like, using the comma character as a delimiter.
| Mainframe developer, System engineer |
Perform the SFTP step. | To use the SFTP protocol from a JCL, use the BPXBATCH utility. The SFTP utility can’t access the MVS datasets directly. You can use the copy command ( Run the
| Mainframe developer, System engineer |
Submit the LODnnnnn JCLs. | The prior JCL has generated all LODnnnnn JCL tables that need to be unloaded, transformed into CSV, and transferred to the S3 bucket. Run the | Mainframe developer, System engineer |
Related resources
For more information about the different tools and solutions used in this document, see the following:
Additional information
After you have your Db2 data on Amazon S3, you have many ways to develop new insights. Because Amazon S3 integrates with AWS data analytics services, you can freely consume or expose this data on the distributed side. For example, you can do the following:
Build a data lake on Amazon S3
, and extract valuable insights by using query-in-place, analytics, and machine learning tools without moving the data. Initiate a Lambda function
by setting up a post-upload processing workflow that is integrated with AWS Transfer Family. Develop new microservices for accessing the data in Amazon S3 or in fully managed database
by using AWS Glue , which is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
In a migration use case, because you can transfer any data from the mainframe to S3, you can do the following:
Retire physical infrastructure, and create a cost-effective data archival strategy with Amazon S3 Glacier and S3 Glacier Deep Archive.
Build scalable, durable, and secure backup and restore solutions with Amazon S3 and other AWS services, such as S3 Glacier and Amazon Elastic File System (Amazon EFS), to augment or replace existing on-premises capabilities.