Understanding initial assessment data requirements - AWS Prescriptive Guidance

Understanding initial assessment data requirements

Data collection can take a significant amount of time and easily become a blocker when there is no clarity about what data is needed and when it is needed. The key is to understand the balance between what is too little and what is too much data for the outcomes of this stage. To focus on the data and the fidelity level required for this early stage of portfolio assessment, adopt an iterative approach to data collection.

Data sources and data requirements

The first step is to identify your sources of data. Start by identifying the key stakeholders within your organization that can fulfill the data requirements. These are typically members of the service management, operations, capacity planning, monitoring, and support teams, and the application owners. Establish working sessions with members of these groups. Communicate data requirements and obtain a list of tools and existing documentation that can provide the data.

To guide these conversations, use the following set of questions:

  • How accurate and up to date is the current infrastructure and application inventory? For example, for the company configuration management database (CMDB), do we already know where the gaps are?

  • Do we have active tools and processes that keep the CMDB (or equivalent) updated? If so, how frequently it is updated? What is the latest refresh date?

  • Does the current inventory, such as the CMDB, contain application-to-infrastructure mapping? Is each infrastructure asset associated to an application? Is each application mapped to infrastructure?

  • Does the inventory contain a catalog of licenses and licensing agreements for each product?

  • Does the inventory contain dependency data? Note the existence of communication data such as server to server, application to application, application or server to database.

  • What other tools that can provide application and infrastructure information are available in the environment? Note the existence of performance, monitoring, and management tools that can be used as a source of data.

  • What are the different locations, such as data centers, hosting our applications and infrastructure?

After these questions have been answered, list your identified sources of data. Then assign a level of fidelity, or level of trust, to each of them. Data validated recently (within 30 days) from active programmatic sources, such as tools, have the highest level of fidelity. Static data is considered of lower fidelity and less trusted. Examples of static data are documents, workbooks, manually updated CMDBs, or any other non-programmatically maintained dataset, or whose last refresh date is older than 60 days.

The data fidelity levels in the following table are provided as examples. We recommend that you assess the requirements of your organization in terms of maximum tolerance to assumptions and associated risk to determine what is an appropriate level of fidelity. In the table, institutional knowledge refers to any information about applications and infrastructure that is not documented.

Data sources

Fidelity level

Portfolio coverage

Comments

Institutional knowledge

Low - Up to 25% of accurate data, 75% assumed values or data is older than 150 days.

Low

Scarce, focused on critical applications

Knowledge base

Medium-low - 35-40% of accurate data, 65-60% assumed values or data is 120-150 days old.

Medium

Manually maintained, inconsistent levels of detail

CMDB

Medium - ~50% of accurate data, ~50% assumed values or data is 90-120 days old.

Medium

Contains data from mixed sources, several data gaps

VMware vCenter exports

Medium-high - 75-80% of accurate data, 25-20% assumed values or data is 60-90 days old.

High

Covers 90% of the virtualized estate

Application performance monitoring

High - Mostly accurate data, ~5% assumed values or data is 0-60 days old.

Low

Limited to critical production systems (covers 15% of the application portfolio)

The following tables specify the required and optional data attributes for each asset class (applications, infrastructure, networks, and migration), the specific activity (inventory or business case), and the recommended data fidelity for this stage of assessment. The tables use the following abbreviations:

  • R, for required

  • (D), for directional business case, required for total cost of ownership (TCO) comparisons and directional business cases

  • (F), for full directional business case, required for TCO comparison and directional business cases that include migration and modernization costs

  • O, for optional

  • N/A, for not applicable

Applications

Attribute name

Description

Inventory and prioritization

Business case

Recommended fidelity level (minimum)

Unique identifier

For example, application ID. Typically available on existent CMDBs or other internal inventories and control systems. Consider creating unique IDs whenever these are not defined in your organization.

R

R (D)

High

Application name

Name by which this application is known to your organization. Include commercial off-the-shelf (COTS) vendor and product name when applicable.

R

R (D)

Medium-high

Is COTS?

Yes or No. Whether this is a commercial application or internal development

R

R (D)

Medium-high

COTS product and version

Commercial software product name and version

R

R (D)

Medium

Description

Primary application function and context

R

O

Medium

Criticality

For example, strategic or revenue-generating application, or supporting a critical function

R

O

Medium-high

Type

For example, database, customer relationship management (CRM), web application, multimedia, IT shared service

R

O

Medium

Environment

For example, production, pre-production, development, test, sandbox

R

R (D)

Medium-high

Compliance and regulatory

Frameworks applicable to the workload (e.g., HIPAA, SOX, PCI-DSS, ISO, SOC, FedRAMP) and regulatory requirements

R

R (D)

Medium-high

Dependencies

Upstream and downstream dependencies to internal and external applications or services. Non-technical dependencies such as operational elements (e.g., maintenance cycles)

O

O

Medium-low

Infrastructure mapping

Mapping to physical and/or virtual assets that make up the application

O

O

Medium

License

Commodity software license type (e.g., Microsoft SQL Server Enterprise)

O

R

Medium-high

Cost

Costs for software license, software operations, and maintenance

N/A

O

Medium

Infrastructure

Attribute Name

Description

Inventory and prioritization

Business case

Recommended fidelity level (minimum)

Unique identifier

For example, server ID. Typically available on existing CMDBs or other internal inventories and control systems. Consider creating unique IDs whenever these are not defined in your organization.

R

R

High

Network name

Asset name in the network (e.g., hostname)

R

O

Medium-high

DNS name (fully qualified domain name, or FQDN)

DNS name

O

O

Medium

IP address and netmask

Internal and/or public IP addresses

R

O

Medium-high

Asset type

Physical or virtual server, hypervisor, container, device, database instance, etc.

R

R

Medium-high

Product name

Commercial vendor and product name (e.g., VMware ESXi, IBM Power Systems, Exadata)

R

R

Medium

Operating system

For example, REHL 8, Windows Server 2019, AIX 6.1

R

R

Medium-high

Configuration

Allocated CPU, number of cores, threads per core, total memory, storage, network cards

R

R

Medium-high

Utilization

CPU, memory, and storage peak and average. Database instance throughput.

R

O

Medium-high

License

Commodity license type (e.g., RHEL Standard)

R

R

Medium

Is shared infrastructure?

Yes or No to denote infrastructure services that provide shared services such as authentication provider, monitoring systems, backup services, and similar services

R

R (D)

Medium

Application mapping

Applications or application components that run in this infrastructure

O

O

Medium

Cost

Fully loaded costs for bare-metal servers, including hardware, maintenance, operations, storage (SAN, NAS, Object), operating system license, share of rackspace, and data center overheads

N/A

O

Medium-high

Networks

Attribute Name

Description

Inventory and prioritization

Business case

Recommended fidelity level (minimum)

Size of pipe (Mb/s), redundancy (Y/N)

Current WAN link specifications (e.g., 1000 Mb/s redundant)

O

R

Medium

Link utilization

Peak and average utilization, outbound data transfer (GB/month)

O

R

Medium

Latency (ms)

Current latency between connected locations.

O

O

Medium

Cost

Current cost per month

N/A

O

Medium

Migration

Attribute Name

Description

Inventory and prioritization

Business case

Recommended fidelity level (minimum)

Rehost

Customer and partner effort for each workload (person-days), customer and Partner cost rates per day, tool cost, number of workloads

N/A

R (F)

Medium-high

Replatform

Customer and partner effort for each workload (person-days), customer and partner cost rates per day, number of workloads

N/A

R (F)

Medium-high

Refactor

Customer and partner effort for each workload (person-days), customer and partner cost rates per day, number of workloads

N/A

O

Medium-high

Retire

Number of servers, average decommission cost

N/A

O

Medium-high

Landing zone

Re-use existing (Y/N), list of AWS Regions needed, cost

N/A

R (F)

Medium-high

People and change

Number of staff to train in cloud operations and development, cost of training per person, cost of training time per person

N/A

R (F)

Medium-high

Duration

Duration of in-scope workload migration (months)

O

R (F)

Medium-high

Parallel cost

Time frame and rate at which as-is costs can be removed during migration

N/A

O

Medium-high

Time frame and rate at which AWS products and services, and other infrastructure costs, are introduced during migration

N/A

O

Medium-high

Evaluating the need for discovery tooling

Does your organization need discovery tooling? Portfolio assessment requires high-confidence, up-to-date data about applications and infrastructure. Initial stages of portfolio assessment can use assumptions to fill data gaps.

However, as progress is made, high-fidelity data enables the creation of successful migration plans and the correct estimation of target infrastructure to reduce cost and maximize benefits. It also reduces risk by enabling implementations that consider dependencies and avoids migration pitfalls. The primary use case for discovery tooling in cloud migration programs is to reduce risk and increase confidence levels in data through the following:

  • Automated or programmatic data collection, resulting in validated, highly trusted data

  • Acceleration of the rate at which data is obtained, improving project speed and reducing costs

  • Increased levels of data completeness, including communication data and dependencies not typically available in CMDBs

  • Obtaining insights such as automated application identification, TCO analysis, projected run rates, and optimization recommendations

  • High-confidence migration wave planning

When there is uncertainty about whether systems exist in a given location, most discovery tools can scan network subnets and discover those systems that respond to ping or Simple Network Management Protocol (SNMP) requests. Note that not all network or systems configurations will allow ping or SNMP traffic. Discuss these options with your network and technical teams.

Further stages of application portfolio assessment and migration heavily rely on accurate dependency-mapping information. Dependency mapping provides an understanding of the infrastructure and configuration that will be required in AWS (such as security groups, instance types, account placement, and network routing). It also helps with grouping applications that must move at the same time (such as applications that must communicate over low latency networks). In addition, dependency mapping provides information for evolving the business case.

When deciding on a discovery tool, it is important to consider all stages of the assessment process and to anticipate data requirements. Data gaps have the potential to become blockers, so it is key to anticipate those by analyzing future data requirements and data sources. Experience in the field dictates that most stalled migration projects have a limited dataset in which the applications in scope, associated infrastructure, and their dependencies are not clearly identified. This lack of identification can lead to incorrect metrics, decisions, and delays. Obtaining up-to-date data is the first step to successful migration projects.

How to select a discovery tool?

Several discovery tools in the market provide different features and capabilities. Consider your requirements. And decide on the most appropriate option for your organization. The most common factors when deciding on a discovery tool for migrations are the following:

Security

  • What is the authentication method to access the tool data repository or analytics engines?

  • Who can access the data, and what are the security controls to access the tool?

  • How does the tool collect data? Does it need dedicated credentials?

  • What credentials and access level does the tool need to access my systems and obtain data?

  • How is data transferred between the tool components?

  • Does the tool support data encryption at rest and in-transit?

  • Is data centralized in a single component inside or outside of my environment?

  • What are the network and firewall requirements?

Ensure that security teams are involved in early conversations about discovery tooling.

Data sovereignty

  • Where is the data stored and processed?

  • Does the tool use a software as a service (SaaS) model?

  • Does it have the possibility to retain all data within the boundaries of my environment?

  • Can data be screened before it leaves the boundaries of my organization?

Consider your organization needs in terms of data residency requirements.

Architecture

  • What infrastructure is required and what are the different components?

  • Is more than one architecture available?

  • Does the tool support installing components in air-locked security zones?

Performance

  • What is the impact of data collection on my systems?

Compatibility and scope

  • Does the tool support all or most of my products and versions? Review the tool documentation to verify supported platforms against the current information about your scope.

  • Are most of my operating systems supported for data collection? If you don't know your operating system versions, try to narrow the list of discovery tools to those with the wider range of supported systems.

Collection methods

  • Does the tool require to install an agent on each targeted system?

  • Does it support agent-less deployments?

  • Do agent and agent-less provide the same features?

  • What is the collection process?

Features

  • What are the features available?

  • Can it calculate total cost of ownership (TCO) and estimated AWS Cloud run rate?

  • Does it support migration planning?

  • Does it measure performance?

  • Can it recommend target AWS infrastructure?

  • Does it perform dependency mapping?

  • What level of dependency mapping does it provide?

Consider tools with strong application and infrastructure dependency-mapping functions and those that can infer applications from communication patterns.

Cost

  • What is the licensing model?

  • How much does the licensing cost?

  • Is the pricing for each server? Is it tiered pricing?

  • Are there any options with limited features that can be licensed on-demand?

Discovery tools are typically used throughout the entire lifecycle of migration projects. If your budget is limited, consider at least 6 months. However, absence of discovery tooling typically leads to higher manual effort and internal costs.

Support model

  • What levels of support are provided by default?

  • Is any support plan available?

  • What are the incident response times?

Professional services

  • Does the vendor offer professional services to analyze discovery outputs?

  • Can they cover the elements of this guide?

  • Are there any discounts or bundles for tooling + services?

Recommended features for the discovery tool

To avoid provisioning and combining data from multiple tools over time, a discovery tool should cover the following minimum features:

  • Software – The discovery tool should be able to identify running processes and installed software.

  • Dependency mapping – It should be able to collect network connection information and build inbound and outbound dependency maps of the servers and running applications. Also, the discovery tool should be able to infer applications from groups of infrastructure based on communication patterns.

  • Profile and configuration discovery – It should be able to report the infrastructure profile such as CPU family (for example, x86, PowerPC), the number of CPU cores, memory size, number of disks and size, and network interfaces.

  • Network storage discovery – It should be able to detect and profile network shares from network-attached storage (NAS).

  • Performance – It should be able to report peak and average utilization of CPU, memory, disk, and network.

  • Gap analysis – It should be able to provide insights on data quantity and fidelity.

  • Network scanning – It should be able to scan network subnets and discover unknown infrastructure assets.

  • Reporting – It should be able to provide collection and analysis status.

Additional features to consider

  • TCO analysis to provide a cost comparison between current on-premises cost and projected AWS cost.

  • Licensing analysis and optimization recommendations for Microsoft SQL Server and Oracle systems in rehost and replatform scenarios.

  • Migration strategy recommendation (Can the discovery tool make default migration R type recommendations based on current technology?)

  • Inventory export (to CSV or a similar format)

  • Right-sizing recommendation (for example, can it map a recommended target AWS infrastructure?)

  • Dependency visualization (for example, can dependency mapping be visualized in a graphical mode?)

  • Architectural view (for example, can architectural diagrams be automatically produced?)

  • API access (for example, can it be programmatically accessed to refresh data on your CMDB?)

  • Application prioritization (Can it assign weight or relevance to application and infrastructure attributes to create prioritization criteria for migration?)

  • Wave planning (for example, recommended groups of applications and the ability to create migration wave plans)

  • Migration cost estimation (estimation of effort to migrate)

Deployment considerations

After you have selected and procured a discovery tool, consider the following questions to drive conversations with the teams responsible for deploying the tool in your organization:

  • Are servers or applications operated by a third party? This could dictate the teams to involve and processes to follow.

  • What is the high-level process for gaining approval to deploy discovery tools?

  • What is the main authentication process to access systems such as servers, containers, storage, and databases? Are server credentials local or centralized? What is the process to obtain credentials? Credentials will be required to collect data from your systems (for example, containers, virtual or physical servers, hypervisors, and databases). Obtaining credentials for the discovery tool to connect to each asset can be challenging, especially when these assets are not centralized.

  • What is the network security zones outline? Are network diagrams available?

  • What is the process for requesting firewall rules in the data centers?

  • What are the current support service-level agreements (SLAs) in relation to data center operations (discovery tool installation, firewall requests)?