Migrate and replicate VSAM files to Amazon RDS or Amazon MSK using Connect from Precisely - AWS Prescriptive Guidance

Migrate and replicate VSAM files to Amazon RDS or Amazon MSK using Connect from Precisely

Created by Prachi Khanna (AWS) and Boopathy GOPALSAMY (AWS)

Environment: PoC or pilot

Source: VSAM

Target: Database

R Type: Re-architect

Workload: IBM

Technologies: Mainframe; Modernization

AWS services: Amazon MSK; Amazon RDS; AWS Mainframe Modernization

Summary

This pattern shows you how to migrate and replicate Virtual Storage Access Method (VSAM) files from a mainframe to a target environment in the AWS Cloud by using Connect from Precisely. The target environments covered in this pattern include Amazon Relational Database Service (Amazon RDS) and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Connect uses change data capture (CDC) to continuously monitor updates to your source VSAM files and then transfer these updates to one or more of your AWS target environments. You can use this pattern to meet your application modernization or data analytics goals. For example, you can use Connect to migrate your VSAM application files to the AWS Cloud with low latency, or migrate your VSAM data to an AWS data warehouse or data lake for analytics that can tolerate synchronization latencies that are higher than required for application modernization.

Prerequisites and limitations

Prerequisites

Limitations

  • Connect doesn’t support automatic target table creation based on source VSAM schemas or copybooks. You must define the target table structure for the first time.

  • For non-streaming targets such as Amazon RDS, you must specify the conversion source to target mapping in the Apply Engine configuration script.

  • Logging, monitoring, and alerting functions are implemented through APIs and require external components (such as Amazon CloudWatch) to be fully operational.

Product versions

  • SQData 40134 for z/OS

  • SQData 4.0.43 for the Amazon Linux Amazon Machine Image (AMI) on Amazon Elastic Compute Cloud (Amazon EC2)

Architecture

Source technology stack

  • Job Control Language (JCL)

  • z/OS Unix shell and Interactive System Productivity Facility (ISPF)

  • VSAM utilities (IDCAMS)

 Target technology stack

  • Amazon EC2

  • Amazon MSK

  • Amazon RDS

  • Amazon VPC

Target architecture

Migrating VSAM files to Amazon RDS

The following diagram shows how to migrate VSAM files to a relational database, such as Amazon RDS, in real time or near real time by using the CDC agent/publisher in the source environment (on-premises mainframe) and the Apply Engine in the target environment (AWS Cloud).

Diagram showing data flow from on-premises mainframe to AWS Cloud, including VSAM files and Amazon RDS.

The diagram shows the following batch workflow:

  1. Connect captures changes to a file by comparing VSAM files from backup files to identify changes and then sends the changes to the logstream.

  2. The publisher consumes data from the system logstream.

  3. The publisher communicates captured data changes to a target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.

  4. The Apply Engine in the target environment receives the changes from the Publisher agent and applies them to a relational or non-relational database.

The diagram shows the following online workflow:

  1. Connect captures changes in the online file by using a log replicate and then streams captured changes to a logstream.

  2. The publisher consumes data from the system logstream.

  3. The publisher communicates captured data changes to the target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.

  4. The Apply Engine in the target environment receives the changes from the Publisher agent and then applies them to a relational or non-relational database.

Migrating VSAM files to Amazon MSK

The following diagram shows how to stream VSAM data structures from a mainframe to Amazon MSK in high-performance mode and automatically generate JSON or AVRO schema conversions that integrate with Amazon MSK.

Diagram showing data flow between on-premises mainframe and AWS Cloud services.

The diagram shows the following batch workflow:

  1. Connect captures changes to a file by using CICS VR or by comparing VSAM files from backup files to identify changes. Captured changes are sent to the logstream.

  2. The publisher consumes data from the system logstream.

  3. The publisher communicates captured data changes to the target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.

  4. The Replicator Engine that’s operating in parallel processing mode splits the data to a unit of work cache.

  5. Worker threads capture the data from the cache.

  6. Data is published to Amazon MSK topics from the worker threads.

  7. Users apply changes from Amazon MSK to targets such as Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), or Amazon OpenSearch Service by using connectors.

The diagram shows the following online workflow:

  1. Changes in the online file are captured by using a log replicate. Captured changes are streamed to the logstream.

  2. The publisher consumes data from the system logstream.

  3. The publisher communicates captured data changes to the target engine through TCP/IP. The Controller Daemon authenticates communication between the source and target environments.

  4. The Replicator Engine that’s operating in parallel processing mode splits the data to a unit of work cache.

  5. Worker threads capture the data from the cache.

  6. Data is published to Amazon MSK topics from the worker threads.

  7. Users apply changes from Amazon MSK to targets such as DynamoDB, Amazon S3, or OpenSearch Service by using connectors.

Tools

Epics

TaskDescriptionSkills required

Install Connect CDC 4.1.

  1. Contact the Precisely Support team to obtain a license and installation packages.

  2. Use example JCLs to install Connect CDC 4.1. For instructions, see Install Connect CDC (SQData) using JCL in the Precisely documentation.

  3. Run the SETPROG APF command to authorize the Connect load libraries SQDATA.V4nnn.LOADLIB.

IBM Mainframe Developer/Admin

Set up the zFS directory.

To set up a zFS directory, follow the instructions from zFS variable directories in the Precisely documentation.

Note: Controller Daemon and Capture/Publisher agent configurations are stored in the z/OS UNIX Systems Services file system (referred to as zFS). The Controller Daemon, Capture, Storage, and Publisher agents require a predefined zFS directory structure for storing a small number of files.

IBM Mainframe Developer/Admin

Configure TCP/IP ports.

To configure TCP/IP ports, follow the instructions from TCP/IP ports in the Precisely documentation.

Note: The Controller Daemon requires TCP/IP ports on source systems. The ports are referenced by the engines on the target systems (where captured change data is processed).

IBM Mainframe Developer/Admin

Create a z/OS logstream.

To create a z/OS logstream, follow the instructions from Create z/OS system logStreams in the Precisely documentation.

Note: Connect uses the logstream to capture and stream data between your source environment and target environment during migration.

For an example JCL that creates a z/OS LogStream, see Create z/OS system logStreams in the Precisely documentation.

IBM Mainframe Developer

Identify and authorize IDs for zFS users and started tasks.

Use RACF to grant access to the OMVS zFS file system. For an example JCL, see Identify and authorize zFS user and started task IDs in the Precisely documentation.

IBM Mainframe Developer/Admin

Generate z/OS public/private keys and the authorized key file.

Run the JCL to generate the key pair. For an example, see Key pair example in the Additional information section of this pattern.

For instructions, see Generate z/OS public and private keys and authorized key file in the Precisely documentation.

IBM Mainframe Developer/Admin

Activate the CICS VSAM Log Replicate and attach it to the logstream.

Run the following JCL script:

//STEP1 EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * ALTER SQDATA.CICS.FILEA - LOGSTREAMID(SQDATA.VSAMCDC.LOG1) - LOGREPLICATE
IBM Mainframe Developer/Admin

Activate the VSAM File Recovery Log through an FCT.

Modify the File Control Table (FCT) to reflect the following parameter changes:

Configure FCT Parms CEDA ALT FILE(name) GROUP(groupname) DSNAME(data set name) RECOVERY(NONE|BACKOUTONLY|ALL) FWDRECOVLOG(NO|1–99) BACKUPTYPE(STATIC|DYNAMIC) RECOVERY PARAMETERS RECOVery : None | Backoutonly | All Fwdrecovlog : No | 1-99 BAckuptype : Static | Dynamic
IBM Mainframe Developer/Admin

Set up CDCzLog for the Publisher agent.

  1. Create the CDCzLog Publisher CAB file.

  2. Encrypt the published data.

  3. Prepare the CDCzLog Publisher Runtime JCL.

IBM Mainframe Developer/Admin

Activate the Controller Daemon.

  1. Open the ISPF panel and run the following command to open the Precisely menu: EXEC 'SQDATA.V4nnnnn.ISPFLIB(SQDC$STA)' 'SQDATA.V4nnnnn'

  2. To set up the Controller Daemon, choose option 2 from the menu.

IBM Mainframe Developer/Admin

Activate the publisher.

  1. Open the ISPF panel and run the following command to open the Precisely menu: EXEC 'SQDATA.V4nnnnn.ISPFLIB(SQDC$STA)' 'SQDATA.V4nnnnn'

  2. To set up the publisher, choose option 3 from the menu and I for insert.

IBM Mainframe Developer/Admin

Activate the logstream.

  1. Open the ISPF panel and run the following command to open the Precisely menu: EXEC 'SQDATA.V4nnnnn.ISPFLIB(SQDC$STA)' 'SQDATA.V4nnnnn'

  2. To set up the logstream, choose option 4 from the menu and I for insert. Then, enter the name of the logstream created in the preceding steps.

IBM Mainframe Developer/Admin
TaskDescriptionSkills required

Install Precisely on an EC2 instance.

To install Connect from Precisely on the Amazon Linux AMI for Amazon EC2, follow the instructions from Install Connect CDC (SQData) on UNIX in the Precisely documentation.

General AWS

Open TCP/IP ports.

To modify the security group to include the Controller Daemon ports for inbound and outbound access, follow the instructions from TCP/IP in the Precisely documentation.

General AWS

Create file directories.

To create file directories, follow the instructions from Prepare target apply environment in the Precisely documentation.

General AWS

Create the Apply Engine configuration file.

Create the Apply Engine configuration file in the working directory of the Apply Engine. The following example configuration file shows Apache Kafka as the target:

builtin.features=SASL_SCRAM security.protocol=SASL_SSL sasl.mechanism=SCRAM-SHA-512 sasl.username= sasl.password= metadata.broker.list=

Note: For more information, see Security in the Apache Kafka documentation.

General AWS

Create scripts for Apply Engine processing.

Create the scripts for the Apply Engine to process source data and replicate source data to the target. For more information, see Create an apply engine script in the Precisely documentation.

General AWS

Run the scripts.

Use the SQDPARSE and SQDENG commands to run the script. For more information, see Parse a script for zOS in the Precisely documentation.

General AWS
TaskDescriptionSkills required

Validate the list of VSAM files and target tables for CDC processing.

  1. Validate VSAM files, including replication logs, recovery logs, FCT parameters, and the logstream.

  2. Validate target database tables, including whether the tables are created as per the required schema definition, table access, and other criteria.

General AWS, Mainframe

Verify that the Connect CDC SQData product is linked.

Run a testing job and verify that the return code from this job is 0 (Successful).

Note: Connect CDC SQData Apply Engine status messages should show active connection messages.

General AWS, Mainframe
TaskDescriptionSkills required

Run the batch job in the mainframe.

Run the batch application job using a modified JCL. Include steps in the modified JCL that do the following:

  1. Take a backup of the data files.

  2. Compare the backup file with the modified data files, generate the delta file, and then note the delta record count from the messages.

  3. Push the delta file to the z/OS logstream.

  4. Run the JCL. For an example JCL, see Prepare file compare capture JCL in the Precisely documentation.

General AWS, Mainframe

Check the logstream.

Check the logstream to confirm that you can see the change data for the completed mainframe batch job.

General AWS, Mainframe

Validate the counts for the source delta changes and target table.

To confirm the records are tallied, do the following:

  1. Gather the source delta count from the batch JCL messages.

  2. Monitor the Apply Engine for record level counts of the number of records inserted, updated, or deleted in the VSAM file.

  3. Query the target table for record counts.

  4. Compare and tally all the different record counts.

General AWS, Mainframe
TaskDescriptionSkills required

Run the online transaction in a CICS region.

  1. Run the online transaction to validate the test case.

  2. Validate the transaction execution code (RC=0 – Success).

IBM Mainframe Developer

Check the logstream.

Confirm that the logstream is populated with specific record level changes.

AWS Mainframe Developer

Validate the count in the target database.

Monitor the Apply Engine for record level counts.

Precisely, Linux

Validate the record counts and data records in the target database.

Query the target database to validate the record counts and data records.

General AWS

Related resources

Additional information

Configuration file example

This is an example configuration file for a logstream where the source environment is a mainframe and the target environment is Amazon MSK:

-- JOBNAME -- PASS THE SUBSCRIBER NAME -- REPORT  progress report will be produced after "n" (number) of Source records processed. JOBNAME VSMTOKFK; --REPORT EVERY 100; -- Change Op has been ‘I’ for insert, ‘D’ for delete , and ‘R’ for Replace. For RDS it is 'U' for update -- Character Encoding on z/OS is Code Page 1047, on Linux and UNIX it is Code Page 819 and on Windows, Code Page 1252 OPTIONS CDCOP('I', 'U', 'D'), PSEUDO NULL = NO, USE AVRO COMPATIBLE NAMES, APPLICATION ENCODING SCHEME = 1208; --       SOURCE DESCRIPTIONS BEGIN GROUP VSAM_SRC; DESCRIPTION COBOL ../copybk/ACCOUNT AS account_file; END GROUP; --       TARGET DESCRIPTIONS BEGIN GROUP VSAM_TGT; DESCRIPTION COBOL ../copybk/ACCOUNT AS account_file; END GROUP; --       SOURCE DATASTORE (IP & Publisher name) DATASTORE cdc://10.81.148.4:2626/vsmcdct/VSMTOKFK OF VSAMCDC AS CDCIN DESCRIBED BY GROUP VSAM_SRC ACCEPT ALL; --       TARGET DATASTORE(s) - Kafka and topic name DATASTORE 'kafka:///MSKTutorialTopic/key' OF JSON AS CDCOUT DESCRIBED BY GROUP VSAM_TGT FOR INSERT; --       MAIN SECTION PROCESS INTO CDCOUT SELECT { SETURL(CDCOUT, 'kafka:///MSKTutorialTopic/key') REMAP(CDCIN, account_file, GET_RAW_RECORD(CDCIN, AFTER), GET_RAW_RECORD(CDCIN, BEFORE)) REPLICATE(CDCOUT, account_file) } FROM CDCIN;

Key pair example

This an example of how to run the JCL to generate the key pair:

//SQDUTIL EXEC PGM=SQDUTIL //SQDPUBL DD DSN=&USER..NACL.PUBLIC, // DCB=(RECFM=FB,LRECL=80,BLKSIZE=21200), // DISP=(,CATLG,DELETE),UNIT=SYSDA, // SPACE=(TRK,(1,1)) //SQDPKEY DD DSN=&USER..NACL.PRIVATE, // DCB=(RECFM=FB,LRECL=80,BLKSIZE=21200), // DISP=(,CATLG,DELETE),UNIT=SYSDA, // SPACE=(TRK,(1,1)) //SQDPARMS DD keygen //SYSPRINT DD SYSOUT= //SYSOUT DD SYSOUT=* //SQDLOG DD SYSOUT=* //*SQDLOG8 DD DUMMY