Task 2: Defining processes for identifying, collecting, and storing metadata - AWS Prescriptive Guidance

Task 2: Defining processes for identifying, collecting, and storing metadata

In the previous task, you validated the initial discovery data, the migration strategies, and the migration patterns for your large migration. In this task, you identify what metadata is required and decide how you will collect it. This task consists of the following steps:

As you complete the steps in this section, consider the entire migration cycle from a metadata perspective. Consider portfolio assessment, wave planning, migration, testing, post-cutover activities, and then analyze all possible use cases and related use cases. Thinking about the information that you need to complete the full migration process helps you identify all of the metadata for that pattern.

Step 1: Define the required metadata

Before you can determine the required metadata attributes, you must understand the migration pattern. For example, you need different metadata for migrating a server to Amazon EC2 and for migrating a database to Amazon RDS. Most patterns are made up of many small tasks. In order to perform the migration pattern, you need to know what metadata attributes are required and then collect the metadata for that application. You must determine and gather the required metadata in the initialization stage so that you can perform the migration efficiently and without delay in the implementation stage.

The person or team that defines the metadata attributes begins by defining the steps and tasks needed to perform the migration pattern. The tasks determine what metadata is needed, so working through each task builds a comprehensive collection of the required metadata. The person who determines what metadata is required typically needs to have a comprehensive understanding of how to complete the migration pattern. Coordination with the person writing the migration runbook might be required. For more information, see the Migration playbook for AWS large migrations.

During a large migration, there are many processes spread across all workstreams that have a dependency on metadata. Having timely and accurate metadata has broad and significant impact to the success of a large migration.

In this step, you define the pattern or task and then use the definition to identify the metadata required.

Identify the key components of the migration patterns and supporting tasks

In this step, for each migration pattern or supporting task, you define the key components, such as the action, source object, target object, and tools used. You then name the pattern or task based on your answers.

Supporting tasks include the operational activities that the portfolio and migration workstreams need to perform during the migration, such as wave planning, application prioritization, dependency analysis, governance, disaster recovery, performance testing, or user-acceptance testing. Because you need metadata to support these tasks, you perform these steps for both the migration patterns and the supporting tasks.

  1. Action – Identify the migration strategy or supporting task. Remember that one action might have other actions associated with it. For example, you might want to define operations for migration. Example actions include:

    • Migration strategy, such as rehost, replatform, or relocate

    • Wave planning

    • Application prioritization and dependency analysis

    • Operation

    • Governance

    • Disaster recovery

    • Testing, such as performance testing or user-acceptance testing (UAT)

  2. Source object – Identify the source object on which the action will be performed. Example source objects include:

    • Waves

    • Server

    • Database

    • File share

    • Application

  3. Tools – Identify the services or tools used to perform the action. You might use more than one tool or service. Example tools include:

    • AWS Application Migration Service

    • AWS DataSync

    • AWS Database Migration Service (AWS DMS)

    • AWS Backup

    • Performance monitoring tools

  4. Target object – Identify the target object, service, or location where the source will reside when the action is complete. Example objects, services, or locations include:

    • Amazon Elastic Compute Cloud (Amazon EC2)

    • Amazon Relational Database Service (Amazon RDS)

    • Amazon Elastic File System (Amazon EFS)

    • Amazon Elastic Container Service (Amazon ECS)

    • Wave plan

  5. Pattern name – Combine your answers to the previous steps as follows:

    <action> <source object> on/to <target object> using <tool>

    The following are examples:

    • Rehost (action) waves, applications, or servers (source object) to Amazon EC2 (target object) using Application Migration Service or Cloud Migration Factory (tools)

    • Replatform (action) file shares (source object) to Amazon EFS (target object) using DataSync (tool)

    • Replatform (action) databases (source object) to Amazon RDS (target object) using AWS DMS (tool)

    • Performance monitoring (action) of applications (source object) on Amazon EC2 (target object) using Amazon CloudWatch (tool)

    • Back up (action) servers (source object) on Amazon EC2 (target object) using AWS Backup (tools) after migration

    • Wave planning (action) waves, applications, or servers (source object) to create a wave plan (target object)

The following is an example of how you might record Pattern 1: Rehost to Amazon EC2 using Application Migration Service or Cloud Migration Factory from the migration patterns table.

Pattern ID

1

Pattern name

Rehost to Amazon EC2 using Application Migration Service or Cloud Migration Factory

Action

Rehost migration

Source object

Waves, applications, or servers

Tools

Application Migration Service or Cloud Migration Factory

Target object

Amazon EC2

Determine the metadata required for each pattern or task

Now that you have defined the pattern or task, you determine the metadata required for the source object, target object, tools, and other business information. To explain this process, this playbook uses Pattern 1: Rehost to Amazon EC2 using Application Migration Service or Cloud Migration Factory from the migration patterns table as an example. Note that for some patterns or tasks, some steps might not apply.

  1. Analyze the target object – Working backwards from the target object, manually create the object and identify the metadata needed to support it. Capture the metadata as demonstrated in the following table.

    For example, when you create an EC2 instance, you must choose an instance type, storage type, storage size, subnet, security group, and tags. The following table includes examples of metadata attributes that you might need if your target object is an EC2 instance.

    Attribute name Object type Description or purpose

    target_subnet

    Target EC2 instance

    Subnet of the target EC2 instance

    target_subnet_test

    Target EC2 instance

    Test subnet of the target EC2 instance

    target_security_group

    Target EC2 instance

    Security group of the target EC2 instance

    target_security_group_test

    Target EC2 instance

    Test security group of the target EC2 instance

    IAM_role

    Target EC2 instance

    AWS Identity and Access Management (IAM) role of the target EC2 instance

    instance_type

    Target EC2 instance

    Instance type of the target EC2 instance

    AWS_account_ID

    Target EC2 instance

    AWS account to host the target EC2 instance

    AWS_Region

    Target EC2 instance

    AWS Region to host the target EC2 instance

  2. Analyze the tools – Use the tool to create a target object and check for differences. Capture the tool-specific metadata as demonstrated in the following table, and remove the attributes from the previous table if it is not supported by the migration tool. For example, you cannot customize the OS type and storage size for Application Migration Service because the rehost migration tool is like-for-like. Therefore, you would remove target OS and target disk size if these attributes were included in the previous table. In the previous example table, all attributes are supported by the tool, so no action is required.

    The following table includes examples of metadata that you might need for the tools.

    Attribute name Object type Description or purpose

    AWS_account_ID

    Tools (Application Migration Service)

    AWS account ID for AWS Application Migration Service

    AWS_Region

    Tools (Application Migration Service)

    AWS Region for Application Migration Service

    replication_server_subnet

    Tools (Application Migration Service)

    Subnet for the Application Migration Service replication server

    replication_server_security_group

    Tools (Application Migration Service)

    Security group for the Application Migration Service replication server

  3. Analyze the source object – Determine the required metadata for the source object by assessing the actions as follows:

    • To migrate servers, you need to know the source server name and fully qualified domain name (FQDN) in order to connect to the server.

    • To migrate applications along with their servers, you need to know the application name, application environment, and application-to-server mapping.

    • To perform a portfolio assessment, prioritize applications, or define a move group, you need to know the application-to-server mapping, application-to-database mapping, and application-to-application dependencies.

    • To manage waves, you need to know the wave ID and the start and end times of the wave.

    The following table includes examples of metadata that you might need for the source object.

    Attribute name Object type Description or purpose

    wave_ID

    Source wave

    ID of the wave (for example: wave 10)

    wave_start_date

    Source wave

    Start date for the wave

    wave_cutover_date

    Source wave

    Cutover date for the wave

    wave_owner

    Source wave

    Owner of the wave

    app_name

    Source application

    Source application name

    app_to_server_mapping

    Source application

    Application-to-server relationship

    app_to_DB_mapping

    Source application

    Application-to-database relationship

    app_to_app_dependencies

    Source application

    External dependencies of the application

    server_name

    Source server

    Source server name

    server_FQDN

    Source server

    Fully qualified domain name of the source server

    server_OS_family

    Source server

    Operating system (OS) family of the source server (for example: Windows or Linux)

    server_OS_version

    Source server

    OS version of the source server (for example: Windows Server 2003)

    server_environment

    Source server

    Environment of the source server (for example: development, production, or test)

    server_tier

    Source server

    Tier of the source server (for example: web, database, or application)

    CPU

    Source server

    Number of CPUs in the source server

    RAM

    Source server

    RAM size of the source server

    disk_size

    Source server

    Disk size of the source server

  4. Consider other attributes – In addition to the primary action, consider other actions and attributes related to the target object or application. For the example pattern, Pattern 1: Rehost to Amazon EC2 using Application Migration Service or Cloud Migration Factory, the action is rehost, and the target object is Amazon EC2. Other related actions for this target object might include backing up to Amazon EC2, monitoring the EC2 instance after the migration, and using tags to manage costs associated with the EC2 instance. You might also want to consider other application attributes that help you manage the migration, such as the application owner, who you might need to contact for questions or cutover purposes.

    The following table includes examples of additional metadata that are commonly used. This table includes tags for your target EC2 instance. For more information about tags and how to use them, see Tag your Amazon EC2 resources in the Amazon EC2 documentation.

    Attribute name Object type Description or purpose

    Name

    Target EC2 instance (tag)

    Tag to define the name of a target EC2 instance

    app_owner

    Source application

    The owner of a source application

    business_unit

    Target EC2 instance (tag)

    Tag to identify the business unit for a target EC2 instance (for example: HR, finance, or IT)

    cost_center

    Target EC2 instance (tag)

    Tag to identify the cost center for a target EC2 instance

  5. Create a table – Combine all of the metadata identified in the previous steps into a single table.

    Attribute name Object type Description or purpose

    wave_ID

    Source wave

    ID of the wave (for example: wave 10)

    wave_start_date

    Source wave

    Start date for the wave

    wave_cutover_date

    Source wave

    Cutover date for the wave

    wave_owner

    Source wave

    Owner of the wave

    app_name

    Source application

    Source application name

    app_to_server_mapping

    Source application

    Application-to-server relationship

    app_to_DB_mapping

    Source application

    Application-to-database relationship

    app_to_app_dependencies

    Source application

    External dependencies of the application

    AWS_account_ID

    Tools (Application Migration Service)

    AWS account to host the target EC2 instance

    AWS_Region

    Tools (Application Migration Service)

    AWS Region to host the target EC2 instance

    replication_server_subnet

    Tools (Application Migration Service)

    Subnet for the Application Migration Service replication server

    replication_server_security_group

    Tools (Application Migration Service)

    Security group for the Application Migration Service replication server

    server_name

    Source server

    Source server name

    server_FQDN

    Source server

    Fully qualified domain name of the source server

    server_OS_family

    Source server

    Operating system (OS) family of the source server (for example: Windows or Linux)

    server_OS_version

    Source server

    OS version of the source server (for example: Windows Server 2003)

    server_environment

    Source server

    Environment of the source server (for example: development, production, or test)

    server_tier

    Source server

    Tier of the source server (for example: web, database, or application)

    CPU

    Source server

    Number of CPUs in the source server

    RAM

    Source server

    RAM size of the source server

    disk_size

    Source server

    Disk size of the source server

    target_subnet

    Target server

    Subnet of the target EC2 instance

    target_subnet_test

    Target server

    Test subnet of the target EC2 instance

    target_security_group

    Target server

    Security group of the target EC2 instance

    target_security_group_test

    Target server

    Test security group of the target EC2 instance

    instance_type

    Target server

    Instance type of the target EC2 instance

    IAM_role

    Target server

    AWS Identity and Access Management (IAM) role of the target EC2 instance

    Name

    Target server (tag)

    Tag to define the name of a target EC2 instance

    app_owner

    Source application

    The owner of a source application

    business_unit

    Target server (tag)

    Tag to identify the business unit for a target EC2 instance (for example: HR, finance, or IT)

    cost_center

    Target server (tag)

    Tag to identify the cost center for a target EC2 instance

  6. Repeat – Repeat this process until you have documented the required metadata for each pattern.

Step 2: Build the metadata storage and collection processes

In the previous step, you defined the metadata required to support your migration. In this step, you build a process for collecting and storing the metadata. This step consists of two tasks:

  1. Analyze the required metadata from the previous step and identify the source.

  2. Define a process for efficiently storing and collecting the metadata.

Analyze the metadata sources

There are many common metadata sources. Usually, the first thing you can access is a high-level asset inventory, which is typically exported from a configuration management database (CMDB) or from another existing tool. However, you need to collect metadata from other sources as well, using both automated and manual processes.

The following table contains common sources, the standard collection process for that source, and the common metadata types that you can expect to find from that source.

Metadata source Collection type Metadata type

Discovery tools

Automated

Source server

CMDB

Automated

Source server

Inventory from other tools, such as RVTools for VMware vSphere

Automated

Source server

Application owner questionnaire

Manual

Source server, target server, wave

Application owner interview

Manual

Source server, target server, wave

Application design documentation

Manual

Target server

Landing zone design documentation

Manual

Target server, tools

After listing all the possible sources of your metadata, you analyze the metadata type and map each source to the metadata attributes that you identified in the previous step.

  1. Get a complete list of metadata attributes from Step 1: Define the required metadata.

  2. Analyze each metadata type and determine which types cannot be retrieved using an automated process. This is usually the target server metadata and wave metadata types because these require decisions from the application owners. For example, which subnet and security group will you use for the target EC2 instances?

  3. Analyze each metadata attribute and map it to a metadata source in the previous table. It is common to have a combination of multiple sources. You can use discovery tools to collect some source server metadata. For information about using discovery tools to collect metadata, see Get started with automated portfolio discovery on the AWS Prescriptive Guidance website.

  4. Create a table to map the metadata attribute to its type and source. The following table is an example.

    Metadata attribute Metadata type Metadata sources

    app_name

    Source application

    CMDB

    app_owner

    Source application

    CMDB

    app_to_server_mapping

    Source application

    CMDB, discovery tools, or application owner questionnaire

    app_to_DB_mapping

    Source application

    CMDB, discovery tools, or application owner questionnaire

    app_to_app_dependencies

    Source application

    CMDB, discovery tools, or application owner questionnaire

    server_name

    Source server

    CMDB

    server_FQDN

    Source server

    CMDB

    server_OS_family

    Source server

    CMDB

    server_IP

    Source server

    Discovery tools

    disk_size

    Source server

    Discovery tools

    instance_type

    Target server

    Discovery tools

    target_subnet

    Target server

    Application owner questionnaire

    target_security_group

    Target server

    Application owner questionnaire

    AWS_Region

    Target server

    Application owner questionnaire

    AWS_account_ID

    Target server

    Application owner questionnaire

    replication_server_subnet

    Tools (Application Migration Service)

    Landing zone design documentation

    replication_server_security_group

    Tools (Application Migration Service)

    Landing zone design documentation

    Name

    Target server (tag)

    Application owner questionnaire

    business_unit

    Target server (tag)

    Application owner questionnaire

    cost_center

    Target server (tag)

    Application owner questionnaire

    wave_ID

    Wave planning

    Application owner interview

    wave_start_date

    Wave planning

    Application owner interview

    wave_cutover_date

    Wave planning

    Application owner interview

Define a single metadata store

After mapping each metadata attribute to its source, you define where to store the metadata. Regardless of how and where you store the metadata, you need to choose only one repository. This ensures that you have a single source of truth. Storing metadata in multiple places is a common mistake in large migrations.

Option 1: Store metadata in a spreadsheet in a shared repository

Although this option might sound like a very manual process, it is the most common data store for large migrations. It is also common to store the spreadsheet in a shared repository, such as a Microsoft SharePoint site.

A Microsoft Excel spreadsheet is easy to customize and doesn’t take a long time to build. The disadvantages are that it will get very complex if you have a lot of metadata and that it can be difficult to manage the relationships between assets, such as between the server, application, and database. The other challenge is version management. You need to limit write access to only a few people, or you need to use an automated process to update the spreadsheet.

In the portfolio playbook templates, you can use the Dashboard template for wave planning and migration (Excel format) as a starting point for building your own data store spreadsheet.

Option 2: Store metadata in a purpose-built tool

You can use a prebuilt tool, such as TDS Transition Manager (TDS website), to store your data, or you can build your own tool. When you build your own tool, you need database tables just like Excel spreadsheet tabs in option 1. For example:

  • Server table

  • Application table

  • Database table

  • Application-to-server and application-to-database mapping table

  • Wave-planning table

  • Application owner questionnaire table

Define the metadata collection processes

In the previous steps, you mapped the metadata to its source and defined a data store where you will collect the metadata. In this step, you build processes to effectively collect the metadata. You should minimize the manual copy-and-paste process and use automation to collect the metadata from each source. There are three steps:

  1. Build an extract, transform, and load (ETL) script for each metadata source based on the metadata mapping table.

  2. Build a scheduled task that imports metadata from each source automatically on a regular basis.

  3. Build an export process or provide application programming interface (API) access to the metadata stored in the repository.

The following table is an example of the metadata attributes collected by each ETL script. The metadata is stored in the location you defined in the previous section, such as a spreadsheet or purpose-built tool.

Metadata attribute Metadata type Metadata source Collection process

app_name

Source application

CMDB

ETL script – CMDB

app_owner

Source application

CMDB

ETL script – CMDB

app_to_server_mapping

Source application

CMDB

ETL script – CMDB

app_to_DB_mapping

Source application

CMDB

ETL script – CMDB

app_to_app_dependencies

Source application

Discovery tool

ETL script – discovery tool

server_name

Source server

CMDB

ETL script – CMDB

server_FQDN

Source server

CMDB

ETL script – CMDB

server_OS_family

Source server

CMDB

ETL script – CMDB

server_OS_version

Source server

CMDB

ETL script – CMDB

disk_size

Source server

Discovery tool

ETL script – discovery tool

instance_type

Target server

Discovery tool

ETL script – discovery tool

target_subnet

Target server

Application owner questionnaire

ETL script – application owner

target_security_group

Target server

Application owner questionnaire

ETL script – application owner

AWS_Region

Target server

Application owner questionnaire

ETL script – application owner

AWS_account_ID

Target server

Application owner questionnaire

ETL script – application owner

Name

Target server (tag)

Application owner questionnaire

ETL script – application owner

business_unit

Target server (tag)

Application owner questionnaire

ETL script – application owner

cost_center

Target server (tag)

Application owner questionnaire

ETL script – application owner

wave_ID

Wave planning

Application owner questionnaire

ETL script – application owner

wave_start_date

Wave planning

Application owner questionnaire

ETL script – application owner

wave_cutover_date

Wave planning

Application owner questionnaire

ETL script – application owner

Step 3: Document metadata requirements and collection processes in a runbook

In this task, you document your decisions in a metadata management runbook. During the migration, your portfolio workstream adheres to this runbook as the standard procedures for collecting and storing metadata.

  1. In the portfolio playbook templates, open the Runbook template for metadata management (Microsoft Word format). This serves as a starting point for building your own runbook.

  2. In the Metadata attributes section, create a metadata attributes table for each migration pattern, and populate the tables with the metadata attributes identified in Step 1: Define the required metadata.

  3. In the Source locations section, document the sources you identified in Analyze the metadata sources.

  4. In the Source location access instructions section, document the steps a user would need to follow in order to access the metadata source locations.

  5. In the Metadata store section, document the steps a user would need to follow in order to access the metadata store you created in Define a single metadata store.

  6. In the Data collection types section, identify the data collection process that you will use for each metadata source. Ideally, you should automate all metadata collection by using automation scripts.

  7. In the Data collection by metadata attribute section, for each metadata attribute, identify the following according to the instructions in Define the metadata collection processes:

    1. Metadata type

    2. Metadata source

    3. Metadata store

    4. Collection type

  8. In the Collect metadata section, update the process as needed for your use case. This is the process the portfolio workstream follows in the implementation stage when they collect metadata for waves.

  9. Verify that your runbook is complete and accurate. This runbook should be a source of truth during the migration.

  10. Share your metadata management runbook with the team for review.

Task exit criteria

Continue to the next task when you have completed the following:

  • You have prepared a single repository for storing the collected metadata.

  • In your metadata management runbook, you have defined and documented the following:

    • The metadata attributes required for each migration pattern

    • Metadata sources and detailed instructions for how to access each source

    • The metadata store and detailed instructions for how to access it

    • The processes used to collect metadata

    • A mapping table that maps metadata attributes to the metadata sources and collection processes