Change Python and Perl applications to support database migration from Microsoft SQL Server to Amazon Aurora PostgreSQL-Compatible Edition - AWS Prescriptive Guidance

Change Python and Perl applications to support database migration from Microsoft SQL Server to Amazon Aurora PostgreSQL-Compatible Edition

Created by Dwarika Patra (AWS) and Deepesh Jayaprakash (AWS)

Environment: PoC or pilot

Source: SQL Server

Target: Aurora PostgreSQL-Compatible

R Type: Replatform

Workload: Microsoft; Open-source

Technologies: Migration; Databases

AWS services: Amazon Aurora

Summary

This pattern describes changes to application repositories that might be required when you migrate databases from Microsoft SQL Server to Amazon Aurora PostgreSQL-Compatible Edition. The pattern assumes that these applications are Python-based or Perl-based, and provides separate instructions for these scripting languages.

Migrating SQL Server databases to Aurora PostgreSQL-Compatible involves schema conversion, database object conversion, data migration, and data loading. Because of the differences between PostgreSQL and SQL Server (relating to data types, connection objects, syntax, and logic), the most difficult migration task involves making the necessary changes to the code base so that it works correctly with PostgreSQL.

For a Python-based application, connection objects and classes are scattered throughout the system. Also, the Python code base might use multiple libraries to connect to the database. If the database connection interface changes, the objects that run the application’s inline queries also require changes.

For a Perl-based application, changes involve connection objects, database connection drivers, static and dynamic inline SQL statements, and how the application handles complex dynamic DML queries and results sets.

When you migrate your application, you can also consider possible enhancements on AWS, such as replacing the FTP server with Amazon Simple Storage Service (Amazon S3) access.

The application migration process involves the following challenges:

  • Connection objects. If connection objects are scattered in the code with multiple libraries and function calls, you might have to find a generalized way to change them to support PostgreSQL.

  • Error or exception handling during record retrieval or updates. If you have conditional create, read, update, and delete (CRUD) operations on the database that return variables, results sets, or data frames, any errors or exceptions might result in application errors with cascading effects. These should be handled carefully with proper validations and save points. One such save point is to call large inline SQL queries or database objects inside BEGIN...EXCEPTION...END blocks.

  • Controlling transactions and their validation. These includes manual and automatic commits and rollbacks. The PostgreSQL driver for Perl requires you to always explicitly set the auto-commit attribute.

  • Handling dynamic SQL queries. This requires a strong understanding of the query logic and iterative testing to ensure that queries work as expected.

  • Performance. You should ensure that code changes don’t result in degraded application performance.

This pattern explains the conversion process in detail.

Prerequisites and limitations

Prerequisites

  • Working knowledge of Python and Perl syntax.

  • Basic skills in SQL Server and PostgreSQL.

  • Understanding of your existing application architecture.

  • Access to your application code, SQL Server database, and PostgreSQL database.

  • Access to Windows or Linux (or other Unix) development environment with credentials for developing, testing, and validating application changes.

  • For a Python-based application, standard Python libraries that your application might require, such as Pandas to handle data frames, and psycopg2 or SQLAlchemy for database connections.

  • For a Perl-based application, required Perl packages with dependent libraries or modules. The Comprehensive Perl Archive Network (CPAN) module can support most application requirements.

  • All required dependent customized libraries or modules. 

  • Database credentials for read access to SQL Server and read/write access to Aurora.

  • PostgreSQL to validate and debug application changes with services and users.

  • Access to development tools during application migration such as Visual Studio Code, Sublime Text, or pgAdmin.

Limitations

  • Some Python or Perl versions, modules, libraries, and packages aren’t compatible with the cloud environment.

  • Some third-party libraries and frameworks used for SQL Server cannot be replaced to support PostgreSQL migration. 

  • Performance variations might require changes to your application, to inline Transact-SQL (T-SQL) queries, database functions, and stored procedures.

  • PostgreSQL supports lowercase names for table names, column names, and other database objects. 

  • Some data types, such as UUID columns, are stored in lowercase only. Python and Perl applications must handle such case differences. 

  • Character encoding differences must be handled with the correct data type for the corresponding text columns in the PostgreSQL database.                                

Product versions

  • Python 3.6 or later (use the version that supports your operating system)

  • Perl 5.8.3 or later (use the version that supports your operating system)

  • Aurora PostgreSQL-Compatible Edition 4.2 or later (see details)

Architecture

Source technology stack

  • Scripting (application programming) language: Python 2.7 or later, or Perl 5.8 

  • Database: Microsoft SQL Server version 13

  • Operating system: Red Hat Enterprise Linux (RHEL) 7 

Target technology stack  

  • Scripting (application programming) language: Python 3.6 or later, or Perl 5.8 or later 

  • Database: Aurora PostgreSQL-Compatible 4.2

  • Operating system: RHEL 7 

Migration architecture 

Migrating a Perl or Python application with SQL Server to Aurora PostgreSQL-Compatible

Tools

AWS services and tools

  • Aurora PostgreSQL–Compatible Edition is a fully managed, PostgreSQL-compatible, and ACID-compliant relational database engine that combines the speed and reliability of high-end commercial databases with the cost-effectiveness of open-source databases. Aurora PostgreSQL is a drop-in replacement for PostgreSQL and makes it easier and more cost-effective to set up, operate, and scale your new and existing PostgreSQL deployments.

  • AWS Command Line Interface (AWS CLI) is an open-source tool that enables you to interact with AWS services by using commands in your command-line shell.

Other tools

Epics

TaskDescriptionSkills required

Follow these code conversion steps to migrate your application to PostgreSQL.

  1. Set database-specific ODBC drivers and libraries for PostgreSQL. For example, you can use one of the CPAN modules for Perl and pyodbc, psycopg2, or SQLAlchemy for Python.

  2. Convert database objects by using these libraries to connect to Aurora PostgreSQL-Compatible.

  3. Apply code changes in existing application modules to get compatible T-SQL statements.

  4. Rewrite database-specific function calls and stored procedures in application code.

  5. Handle changes to your application's variables and their data types that are used for inline SQL queries.

  6. Handle incompatible database-specific functions.

  7. Complete end-to-end testing of converted application code for database migration.

  8. Compare results from Microsoft SQL Server against the application you migrated to PostgreSQL.

  9. Perform application performance benchmarking between Microsoft SQL Server and PostgreSQL.

  10. Revise stored procedures or inline T-SQL statements called by the application to improve performance.

The following epics provide detailed instructions for some of these conversion tasks for Python and Perl applications.

App developer

Use a checklist for each step of the migration.

Add the following to your checklist for each step of application migration, including the final step:

  • Review PostgreSQL documentation to ensure that all your changes are compatible with the PostgreSQL standard.

  • Check for integer and floating values for columns.

  • Identify the number of rows inserted, updated, and extracted, along with the column names and date/time stamps. You can use a diff utility or write a script to automate these checks.

  • Complete performance checks for large inline SQL statements, and check the overall performance of the application.

  • Check for correct error handling for database operations and graceful program exit by using multiple try/catch blocks.

  • Check to ensure that proper logging processes are in place.

App developer
TaskDescriptionSkills required

Analyze your existing Python code base.

Your analysis should include the following to facilitate the application migration process:

  • Identify all connection objects in the code.

  • Identify all incompatible inline SQL queries (such as T-SQL statements and stored procedures) and analyze required changes.

  • Review the documentation for your code and track the control flow to understand code functionality. This will be helpful later when you test the application for performance or load comparisons.

  • Understand the purpose of the application so you can test it effectively after database conversion. Most Python applications that are candidates for conversion with database migrations are either feeds that load data from other sources into database tables, or extractors that retrieve data from the tables and transform them into different output formats (such as CSV, JSON, or flat files) that are suitable for creating reports or for making API calls to perform validations. 

App developer

Convert your database connections to support PostgreSQL.

Most Python applications use the pyodbc library to connect with SQL Server databases as follows.

import pyodbc .... try: conn_string = "Driver=ODBC Driver 17 for SQL Server;UID={};PWD={};Server={};Database={}".format (conn_user, conn_password, conn_server, conn_database) conn = pyodbc.connect(conn_string) cur = conn.cursor() result = cur.execute(query_string) for row in result: print (row) except Exception as e: print(str(e))

Convert the database connection to support PostgreSQL as follows.

import pyodbc import psycopg2 .... try: conn_string = ‘postgresql+psycopg2://’+ conn_user+’:’+conn_password+’@’+conn_server+’/’+conn_database conn = pyodbc.connect(conn_string, connect_args={‘options’:’-csearch_path=dbo’}) cur = conn.cursor() result = cur.execute(query_string) for row in result: print (row) except Exception as e: print(str(e))
App developer

Change inline SQL queries to PostgreSQL.

Convert your inline SQL queries to a PostgreSQL-compatible format. For example, the following SQL Server query retrieves a string from a table.

dtype = “type1” stm = ‘“SELECT TOP 1 searchcode FROM TypesTable (NOLOCK) WHERE code=”’ + “’” + str(dtype) + “’” # For Microsoft SQL Server Database Connection engine = create_engine(‘mssql+pyodbc:///?odbc_connect=%s’ % urllib.parse.quote_plus(conn_string), connect_args={‘connect_timeout’:login_timeout}) conn = engine_connect() rs = conn.execute(stm) for row in rs: print(row)

After conversion, the PostgreSQL-compatible inline SQL query looks like the following.

dtype = “type1” stm = ‘“SELECT searchcode FROM TypesTable WHERE code=”’ + “’” + str(dtype) + “’ LIMIT 1” # For PostgreSQL Database Connection engine = create_engine(‘postgres+psycopg2://%s’ %conn_string, connect_args={‘connect_timeout’:login_timeout}) conn = engine.connect() rs = conn.execute(stm) for row in rs: print(row)
App developer

Handle dynamic SQL queries.

Dynamic SQL can be present in one script or in multiple Python scripts. Earlier examples showed how to use Python’s string replace function to insert variables for constructing dynamic SQL queries. An alternate approach is to append the query string with variables wherever applicable. 

In the following example, the query string is constructed on the fly based on the values returned by a function.

query = ‘“SELECT id from equity e join issues i on e.permId=i.permId where e.id’” query += get_id_filter(ids) + “ e.id is NOT NULL

These types of dynamic queries are very common during application migration. Follow these steps to handle dynamic queries:

  • Check the overall syntax (for example, the syntax for the SELECT statement with a JOIN clause).

  • Verify all variables or column names used in the query, such as i and id.

  • Check the functions, arguments, and return values used in the query (for example, get_id_filter and its argument ids).

App developer

Handle results sets, variables, and data frames.

For Microsoft SQL Server, you use Python methods such as fetchone() or fetchall() to retrieve the results set from the database. You can also use fetchmany(size) and specify the number of records to return from the results set. To do this, you can use the pyodbc connection object as shown in the following example.

pyodbc (Microsoft SQL Server)

import pyodbc server = 'tcp:myserver.database.windows.net' database = 'exampledb' username = 'exampleusername' password = 'examplepassword' conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password) cursor = conn.cursor() cursor.execute("SELECT * FROM ITEMS") row = cursor.fetchone() while row: print(row[0]) row = cursor.fetchone()

In Aurora, to perform similar tasks such as connecting to PostgreSQL and fetching results sets, you can use either psycopg2 or SQLAlchemy. These Python libraries provide the connection module and cursor object to traverse through the PostgreSQL database records, as shown in the following example.

psycopg2 (Aurora PostgreSQL-Compatible)

import psycopg2 query = "SELECT * FROM ITEMS;" //Initialize variables host=dbname=user=password=port=sslmode=connect_timeout="" connstring = "host='{host}' dbname='{dbname}' user='{user}' \ password='{password}'port='{port}'".format(host=host,dbname=dbname,\ user=user,password=password,port=port) conn = psycopg2.connect(connstring) cursor = conn.cursor() cursor.execute(query) column_names = [column[0] for column in cursor.description] print("Column Names: ", column_names) print("Column values: " for row in cursor: print("itemid :", row[0]) print("itemdescrption :", row[1]) print("itemprice :", row[3]))

SQLAlchemy (Aurora PostgreSQL-Compatible)

from sqlalchemy import create_engine from pandas import DataFrame conn_string = 'postgresql://core:database@localhost:5432/exampledatabase' engine = create_engine(conn_string) conn = engine.connect() dataid = 1001 result = conn.execute("SELECT * FROM ITEMS") df = DataFrame(result.fetchall()) df.columns = result.keys() df = pd.DataFrame() engine.connect() df = pd.read_sql_query(sql_query, engine, coerce_float=False) print(“df=”, df)
App developer

Test your application during and after migration.

Testing the migrated Python application is an ongoing process. Because the migration includes connection object changes (psycopg2 or SQLAlchemy), error handling, new features (data frames), inline SQL changes, bulk copy functionalities (bcp instead of COPY) and similar changes, it must be tested carefully during and after application migration. Check for:

  • Error conditions and handling 

  • Any record mismatches after migration

  • Record updates or deletions

  • Time required to run the application 

App developer
TaskDescriptionSkills required

Analyze your existing Perl code base.

Your analysis should include the following to facilitate the application migration process. You should identify:

  • Any INI or configuration-based code

  • Database-specific standard Open Database Connectivity (ODBC) Perl drivers or any customized drivers

  • Code changes required for inline and T-SQL queries

  • Interactions among various Perl modules (for example, a single Perl ODBC connection object that is called or used by multiple functional components)

  • Dataset and results set handling

  • External, dependent Perl libraries

  • Any APIs that are used in the application

  • Perl version compatibility and driver compatibility with Aurora PostgreSQL-Compatible

App developer

Convert the connections from the Perl application and DBI module to support PostgreSQL.

Perl-based applications generally use the Perl DBI module, which is a standard database access module for the Perl programming language. You can use the same DBI module with different drivers for SQL Server and PostgreSQL.

For more information about required Perl modules, installations, and other instructions, see the DBD::Pg documentation. The following example connects to Aurora PostgreSQL-Compatible at exampletest-aurorapg-database.cluster-sampleclusture.us-east-.rds.amazonaws.com.

#!/usr/bin/perl use DBI; use strict; my $driver = "Pg"; my $hostname = “exampletest-aurorapg-database-sampleclusture.us-east.rds.amazonaws.com” my $dsn = "DBI:$driver: dbname = $hostname;host = 127.0.0.1;port = 5432"; my $username = "postgres"; my $password = "pass123"; $dbh = DBI->connect("dbi:Pg:dbname=$hostname;host=$host;port=$port;options=$options", $username, $password, {AutoCommit => 0, RaiseError => 1, PrintError => 0} );
App developer

Change Inline SQL queries to PostgreSQL.

Your application might have inline SQL queries with SELECT, DELETE, UPDATE, and similar statements that include query clauses that PostgreSQL doesn’t support. For example, query keywords such as TOP and NOLOCK aren’t supported in PostgreSQL. The following examples show how you can handle TOP, NOLOCK, and Boolean variables.

In SQL Server:

$sqlStr = $sqlStr . "WHERE a.student_id in (SELECT TOP $numofRecords c_student_id \ FROM active_student_record b WITH (NOLOCK) \ INNER JOIN student_contributor c WITH (NOLOCK) on c.contributor_id = b.c_st)

For PostgreSQL, convert to:

$sqlStr = $sqlStr . "WHERE a.student_id in (SELECT TOP $numofRecords c_student_id \ FROM active_student_record b INNER JOIN student_contributor c \ on c.contributor_id = b.c_student_contr_id WHERE b_current_1 is true \ LIMIT $numofRecords)"
App developer

Handle dynamic SQL queries and Perl variables.

Dynamic SQL queries are SQL statements that are built at application runtime. These queries are constructed dynamically when the application is running, depending on certain conditions, so the full text of the query isn’t known until runtime. An example is a financial analytics application that analyzes the top 10 shares on a daily basis, and these shares change every day. The SQL tables are created based on top performers, and the values aren’t known until runtime.

Let’s say that the inline SQL queries for this example are passed to a wrapper function to get the results set in a variable, and then a variable uses a condition to determine whether the table exists:

  • If the table exists, don't create it; do some processing.

  • If the table doesn’t exist, create the table and also do some processing.

Here’s an example of variable handling, followed by the SQL Server and PostgreSQL queries for this use case.

my $tableexists = db_read( arg 1, $sql_qry, undef, 'writer'); my $table_already_exists = $tableexists->[0]{table_exists}; if ($table_already_exists){ # do some thing } else { # do something else }

SQL Server:

my $sql_qry = “SELECT OBJECT_ID('$backendTable', 'U') table_exists", undef, 'writer')";

PostgreSQL:

my $sql_qry = “SELECT TO_REGCLASS('$backendTable', 'U') table_exists", undef, 'writer')";

The following example uses a Perl variable in inline SQL, which runs a SELECT statement with a JOIN to fetch the primary key of the table and position of the key column.

SQL Server:

my $sql_qry = "SELECT column_name', character_maxi mum_length \ FROM INFORMATION_SCHEMA.COLUMNS \ WHERE TABLE_SCHEMA='$example_schemaInfo' \ AND TABLE_NAME='$example_table' \ AND DATA_TYPE IN ('varchar','nvarchar');";

PostgreSQL:

my $sql_qry = "SELECT c1.column_name, c1.ordinal_position \ FROM information_schema.key_column_usage AS c LEFT \ JOIN information_schema.table_constraints AS t1 \ ON t1.constraint_name = c1.constraint_name \ WHERE t1.table_name = $example_schemaInfo'.'$example_table’ \ AND t1.constraint_type = 'PRIMARY KEY' ;";
App developer
TaskDescriptionSkills required

Convert additional SQL Server constructs to PostgreSQL.

The following changes apply to all applications, regardless of programming language.

  • Qualify database objects that your application uses with new and appropriate schema names.

  • Handle LIKE operators for case-sensitive matching with the collation feature in PostgreSQL.

  • Handle unsupported database specific functions such as DATEDIFF, DATEADD, GETDATE, CONVERT, and CAST operators. For equivalent PostgreSQL-compatible functions, see  Native or built-in SQL functions in the Additional information section. 

  • Handle Boolean values in comparison statements.

  • Handle return values from functions. These could be record sets, data frames, variables, and Boolean values. Handle these according to the requirements of your application and to support PostgreSQL.

  • Handle anonymous blocks (such as BEGIN TRAN) with new, user-defined PostgreSQL functions.

  • Convert bulk inserts for rows. The PostgreSQL equivalent of the SQL Server bulk copy (bcp) utility, which is called from inside the application, is COPY.

  • Convert column concatenation operators. SQL Server uses + for string concatenation, but PostgreSQL uses ||.

App developer
TaskDescriptionSkills required

Take advantage of AWS services to make performance enhancements.

When you migrate to the AWS Cloud, you can refine your application and database design to take advantage of AWS services. For example, if the queries from your Python application, which is connected to an Aurora PostgreSQL-Compatible database server, is taking more time than your original Microsoft SQL Server queries, you could consider creating a feed of historical data directly to an Amazon Simple Storage Service (Amazon S3) bucket from the Aurora server, and use Amazon Athena-based SQL queries to generate reports and analytic data queries for your user dashboards.

App developer, Cloud architect

Related resources

Additional information

Both Microsoft SQL Server and Aurora PostgreSQL-Compatible are ANSI SQL-complaint. However, you should still be aware of any incompatibilities in syntax, column data types, native database-specific functions, bulk inserts, and case sensitivity when you migrate your Python or Perl application from SQL Server to PostgreSQL.

The following sections provide more information about possible inconsistencies.

Data type comparison

Data type changes from SQL Server to PostgreSQL can lead to significant differences in the resulting data that applications operate on. For a comparison of data types, see the table on the Sqlines website.

Native or built-in SQL functions

The behavior of some functions differs between SQL Server and PostgreSQL databases. The following table provides a comparison.

Microsoft SQL Server

Description

PostgreSQL

CAST 

Converts a value from one data type to another.

PostgreSQL type :: operator

GETDATE()

Returns the current database system date and time, in a YYYY-MM-DD hh:mm:ss.mmm format.

CLOCK_TIMESTAMP

DATEADD

Adds a time/date interval to a date.

INTERVAL expression

CONVERT

Converts a value to a specific data format.

TO_CHAR

DATEDIFF

Returns the difference between two dates.

DATE_PART

TOP

Limits the number of rows in a SELECT results set.

LIMIT/FETCH

Anonymous blocks

A structured SQL query is organized into sections such as declaration, executables, and exception handling. The following table compares the Microsoft SQL Server and PostgreSQL versions of a simple anonymous block. For complex anonymous blocks, we recommend that you call a custom database function within your application.

Microsoft SQL Server

PostgreSQL

my $sql_qry1= my $sql_qry2 = my $sqlqry = "BEGIN TRAN $sql_qry1 $sql_qry2 if @\@error !=0 ROLLBACK TRAN else COMIT TRAN";
my $sql_qry1= my $sql_qry2 = my $sql_qry = " DO \$\$ BEGIN $header_sql $content_sql END \$\$";

 

Other differences

  • Bulk inserts of rows: The PostgreSQL equivalent of the Microsoft SQL Server bcp utility is COPY.

  • Case sensitivity:  Column names are case-sensitive in PostgreSQL, so you have to convert your SQL Server column names to lowercase or uppercase. This becomes a factor when you extract or compare data, or place column names in results sets or variables. The following example identifies columns where values might be stored in uppercase or lowercase.

my $sql_qry = "SELECT $record_id FROM $exampleTable WHERE LOWER($record_name) = \'failed transaction\'";
  • Concatenation: SQL Server uses + as an operator for string concatenation, whereas PostgreSQL uses ||.

  • Validation: You should test and validate inline SQL queries and functions before you use them in application code for PostgreSQL.

  • ORM Library inclusion : You can also look for including or replace existing database connection library with Python ORM libraries such as SQLAlchemy and PynomoDB. This will help to easily query and manipulate data from a database using an object-oriented paradigm.