Migrate and replicate VSAM files to Amazon RDS or Amazon MSK using Connect from Precisely
Created by Prachi Khanna (AWS) and Boopathy GOPALSAMY (AWS)
Environment: PoC or pilot | Source: VSAM | Target: Database |
R Type: Re-architect | Workload: IBM | Technologies: Mainframe; Modernization |
AWS services: Amazon MSK; Amazon RDS; AWS Mainframe Modernization |
Summary
This pattern shows you how to migrate and replicate Virtual Storage Access Method (VSAM) files from a mainframe to a target environment in the AWS Cloud by using Connect
Prerequisites and limitations
Prerequisites
IBM z/OS V2R1
or later CICS Transaction Server for z/OS (CICS TS) V5.1
or later (CICS/VSAM data capture) IBM MQ 8.0
or later Compliance with z/OS security requirements
(for example, APF authorization for SQData load libraries) VSAM recovery logs turned on
(Optional) CICS VSAM Recovery Version (CICS VR)
to automatically capture CDC logs An active AWS account
An Amazon Virtual Private Cloud (VPC) with a subnet that’s reachable by your legacy platform
A VSAM Connect license from Precisely
Limitations
Connect doesn’t support automatic target table creation based on source VSAM schemas or copybooks. You must define the target table structure for the first time.
For non-streaming targets such as Amazon RDS, you must specify the conversion source to target mapping in the Apply Engine configuration script.
Logging, monitoring, and alerting functions are implemented through APIs and require external components (such as Amazon CloudWatch) to be fully operational.
Product versions
SQData 40134 for z/OS
SQData 4.0.43 for the Amazon Linux Amazon Machine Image (AMI) on Amazon Elastic Compute Cloud (Amazon EC2)
Architecture
Source technology stack
Job Control Language (JCL)
z/OS Unix shell and Interactive System Productivity Facility (ISPF)
VSAM utilities (IDCAMS)
Target technology stack
Amazon EC2
Amazon MSK
Amazon RDS
Amazon VPC
Target architecture
Migrating VSAM files to Amazon RDS
The following diagram shows how to migrate VSAM files to a relational database, such as Amazon RDS, in real time or near real time by using the CDC agent/publisher in the source environment (on-premises mainframe) and the Apply Engine
The diagram shows the following batch workflow:
Connect captures changes to a file by comparing VSAM files from backup files to identify changes and then sends the changes to the logstream.
The publisher consumes data from the system logstream.
The publisher communicates captured data changes to a target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.
The Apply Engine in the target environment receives the changes from the Publisher agent and applies them to a relational or non-relational database.
The diagram shows the following online workflow:
Connect captures changes in the online file by using a log replicate and then streams captured changes to a logstream.
The publisher consumes data from the system logstream.
The publisher communicates captured data changes to the target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.
The Apply Engine in the target environment receives the changes from the Publisher agent and then applies them to a relational or non-relational database.
Migrating VSAM files to Amazon MSK
The following diagram shows how to stream VSAM data structures from a mainframe to Amazon MSK in high-performance mode and automatically generate JSON or AVRO schema conversions that integrate with Amazon MSK.
The diagram shows the following batch workflow:
Connect captures changes to a file by using CICS VR or by comparing VSAM files from backup files to identify changes. Captured changes are sent to the logstream.
The publisher consumes data from the system logstream.
The publisher communicates captured data changes to the target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.
The Replicator Engine that’s operating in parallel processing mode splits the data to a unit of work cache.
Worker threads capture the data from the cache.
Data is published to Amazon MSK topics from the worker threads.
Users apply changes from Amazon MSK to targets such as Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), or Amazon OpenSearch Service by using connectors.
The diagram shows the following online workflow:
Changes in the online file are captured by using a log replicate. Captured changes are streamed to the logstream.
The publisher consumes data from the system logstream.
The publisher communicates captured data changes to the target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.
The Replicator Engine that’s operating in parallel processing mode splits the data to a unit of work cache.
Worker threads capture the data from the cache.
Data is published to Amazon MSK topics from the worker threads.
Users apply changes from Amazon MSK to targets such as DynamoDB, Amazon S3, or OpenSearch Service by using connectors.
Tools
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that helps you build and run applications that use Apache Kafka to process streaming data.
Amazon Relational Database Service (Amazon RDS) helps you set up, operate, and scale a relational database in the AWS Cloud.
Epics
Task | Description | Skills required |
---|---|---|
Install Connect CDC 4.1. |
| IBM Mainframe Developer/Admin |
Set up the zFS directory. | To set up a zFS directory, follow the instructions from zFS variable directories Note: Controller Daemon and Capture/Publisher agent configurations are stored in the z/OS UNIX Systems Services file system (referred to as zFS). The Controller Daemon, Capture, Storage, and Publisher agents require a predefined zFS directory structure for storing a small number of files. | IBM Mainframe Developer/Admin |
Configure TCP/IP ports. | To configure TCP/IP ports, follow the instructions from TCP/IP ports Note: The Controller Daemon requires TCP/IP ports on source systems. The ports are referenced by the engines on the target systems (where captured change data is processed). | IBM Mainframe Developer/Admin |
Create a z/OS logstream. | To create a z/OS logstream Note: Connect uses the logstream to capture and stream data between your source environment and target environment during migration. For an example JCL that creates a z/OS LogStream, see Create z/OS system logStreams | IBM Mainframe Developer |
Identify and authorize IDs for zFS users and started tasks. | Use RACF to grant access to the OMVS zFS file system. For an example JCL, see Identify and authorize zFS user and started task IDs | IBM Mainframe Developer/Admin |
Generate z/OS public/private keys and the authorized key file. | Run the JCL to generate the key pair. For an example, see Key pair example in the Additional information section of this pattern. For instructions, see Generate z/OS public and private keys and authorized key file | IBM Mainframe Developer/Admin |
Activate the CICS VSAM Log Replicate and attach it to the logstream. | Run the following JCL script:
| IBM Mainframe Developer/Admin |
Activate the VSAM File Recovery Log through an FCT. | Modify the File Control Table (FCT) to reflect the following parameter changes:
| IBM Mainframe Developer/Admin |
Set up CDCzLog for the Publisher agent. |
| IBM Mainframe Developer/Admin |
Activate the Controller Daemon. |
| IBM Mainframe Developer/Admin |
Activate the publisher. |
| IBM Mainframe Developer/Admin |
Activate the logstream. |
| IBM Mainframe Developer/Admin |
Task | Description | Skills required |
---|---|---|
Install Precisely on an EC2 instance. | To install Connect from Precisely on the Amazon Linux AMI for Amazon EC2, follow the instructions from Install Connect CDC (SQData) on UNIX | General AWS |
Open TCP/IP ports. | To modify the security group to include the Controller Daemon ports for inbound and outbound access, follow the instructions from TCP/IP | General AWS |
Create file directories. | To create file directories, follow the instructions from Prepare target apply environment | General AWS |
Create the Apply Engine configuration file. | Create the Apply Engine configuration file in the working directory of the Apply Engine. The following example configuration file shows Apache Kafka as the target:
Note: For more information, see Security | General AWS |
Create scripts for Apply Engine processing. | Create the scripts for the Apply Engine to process source data and replicate source data to the target. For more information, see Create an apply engine script | General AWS |
Run the scripts. | Use the | General AWS |
Task | Description | Skills required |
---|---|---|
Validate the list of VSAM files and target tables for CDC processing. |
| General AWS, Mainframe |
Verify that the Connect CDC SQData product is linked. | Run a testing job and verify that the return code from this job is 0 (Successful). Note: Connect CDC SQData Apply Engine status messages should show active connection messages. | General AWS, Mainframe |
Task | Description | Skills required |
---|---|---|
Run the batch job in the mainframe. | Run the batch application job using a modified JCL. Include steps in the modified JCL that do the following:
| General AWS, Mainframe |
Check the logstream. | Check the logstream to confirm that you can see the change data for the completed mainframe batch job. | General AWS, Mainframe |
Validate the counts for the source delta changes and target table. | To confirm the records are tallied, do the following:
| General AWS, Mainframe |
Task | Description | Skills required |
---|---|---|
Run the online transaction in a CICS region. |
| IBM Mainframe Developer |
Check the logstream. | Confirm that the logstream is populated with specific record level changes. | AWS Mainframe Developer |
Validate the count in the target database. | Monitor the Apply Engine for record level counts. | Precisely, Linux |
Validate the record counts and data records in the target database. | Query the target database to validate the record counts and data records. | General AWS |
Related resources
VSAM z/OS
(Precisely documentation) Apply engine
(Precisely documentation) Replicator engine
(Precisely documentation) The log stream
(IBM documentation)
Additional information
Configuration file example
This is an example configuration file for a logstream where the source environment is a mainframe and the target environment is Amazon MSK:
-- JOBNAME -- PASS THE SUBSCRIBER NAME -- REPORT progress report will be produced after "n" (number) of Source records processed. JOBNAME VSMTOKFK; --REPORT EVERY 100; -- Change Op has been ‘I’ for insert, ‘D’ for delete , and ‘R’ for Replace. For RDS it is 'U' for update -- Character Encoding on z/OS is Code Page 1047, on Linux and UNIX it is Code Page 819 and on Windows, Code Page 1252 OPTIONS CDCOP('I', 'U', 'D'), PSEUDO NULL = NO, USE AVRO COMPATIBLE NAMES, APPLICATION ENCODING SCHEME = 1208; -- SOURCE DESCRIPTIONS BEGIN GROUP VSAM_SRC; DESCRIPTION COBOL ../copybk/ACCOUNT AS account_file; END GROUP; -- TARGET DESCRIPTIONS BEGIN GROUP VSAM_TGT; DESCRIPTION COBOL ../copybk/ACCOUNT AS account_file; END GROUP; -- SOURCE DATASTORE (IP & Publisher name) DATASTORE cdc://10.81.148.4:2626/vsmcdct/VSMTOKFK OF VSAMCDC AS CDCIN DESCRIBED BY GROUP VSAM_SRC ACCEPT ALL; -- TARGET DATASTORE(s) - Kafka and topic name DATASTORE 'kafka:///MSKTutorialTopic/key' OF JSON AS CDCOUT DESCRIBED BY GROUP VSAM_TGT FOR INSERT; -- MAIN SECTION PROCESS INTO CDCOUT SELECT { SETURL(CDCOUT, 'kafka:///MSKTutorialTopic/key') REMAP(CDCIN, account_file, GET_RAW_RECORD(CDCIN, AFTER), GET_RAW_RECORD(CDCIN, BEFORE)) REPLICATE(CDCOUT, account_file) } FROM CDCIN;
Key pair example
This an example of how to run the JCL to generate the key pair:
//SQDUTIL EXEC PGM=SQDUTIL //SQDPUBL DD DSN=&USER..NACL.PUBLIC, // DCB=(RECFM=FB,LRECL=80,BLKSIZE=21200), // DISP=(,CATLG,DELETE),UNIT=SYSDA, // SPACE=(TRK,(1,1)) //SQDPKEY DD DSN=&USER..NACL.PRIVATE, // DCB=(RECFM=FB,LRECL=80,BLKSIZE=21200), // DISP=(,CATLG,DELETE),UNIT=SYSDA, // SPACE=(TRK,(1,1)) //SQDPARMS DD keygen //SYSPRINT DD SYSOUT= //SYSOUT DD SYSOUT=* //SQDLOG DD SYSOUT=* //*SQDLOG8 DD DUMMY