Selecting and deploying a discovery tool - AWS Prescriptive Guidance

Selecting and deploying a discovery tool

This section is intended for technical leaders and architects who are responsible for planning migration projects at any scale.

When planning a migration to the cloud, it’s crucial to have a holistic view of your environment, from logical communication to hardware capacity. These details might seem basic, but they make a large difference when you are determining how and where to migrate your environment.

A discovery tool is designed to provide you with information about your environment, such as the following:

  • Lifecycle status

  • Capacity utilization

  • Application dependencies

  • Technical standards

  • General information about each asset in your environment

In addition to mapping the dependencies, it’s important to find patterns in your environment. By finding those patterns, you can see how to reuse a migration approach. For example, you can reuse a single migration approach for multiple applications that have similar versions, hardware, communication, and other components.

To find those patterns, you need documented information about your infrastructure. A discovery tool can help you find and document that information.

Phase 1: Initial assessment

In the initial assessment phase, be sure to involve the right stakeholders in the discussion. You want to assess what kind of tool can provide valuable insights to support your migration journey to the cloud.

As part of this phase, answer the questions in the following table as completely as you can. The more data gaps you find in the beginning, the closer you will be to selecting a tool that provides insights into your needs.

Question Example

Do you have a configuration management database (CMDB) tool today? Is it reliable? How can it help your journey?

  • We have CMDB Y tool installed.

  • 80% of the baseline is up to date.

  • Hostname, application name, support contact, IP address, and operating system (OS) version (OS versions are not up to date).

Do you have a monitoring and performance tool? Is it reliable? Does it support all your assets? What does it have and what doesn’t it have?

  • We use a home-built application that gathers server performance (CPU, memory, and disk usage).

  • It doesn’t work for our SUSE Linux Enterprise Server (SLES) 11.4, or for Windows 2003 and 2008.

  • We don’t think the data is accurate, and we don't know which servers are communicating with each other.

What do you want to know about your environment that you don't know today?

  • Application mapping. We don’t have package data installed in the server and want to know whether the IBM WebSphere, Java, .NET, or any other middleware version is installed on the server.

  • The servers are grouped manually. Sometimes we don’t know if the grouping is right or up to date. We want to know what are the servers, IP address, and port that they are talking to for inbound and outbound traffic.

What is your goal with a discovery tool? Why do you think you need one?

  • We want help with our migration planning.

  • We have legacy applications that lack documentation.

  • We need to know the performance of all the servers to be able to size a target environment.

Can any of your existing tools give you the information you want?

  • Our CMDB and home-built application can help with CPU metrics (average and peak usage), memory metrics (average and peak usage), total disk usage, tools installed on each server, application version, and team contact information.

What size is the baseline on which you intend to run the discovery tool?

  • 500 Windows servers (400 virtual and 100 physical)

  • 1,200 Linux servers (800 virtual and 400 physical)

  • 200 Linux containers

What operating systems and versions are running in the environment?

  • Windows 2003 SP2

  • IBM AIX 7.2 and 5.3

  • Red Hat Enterprise Linux (RHEL) 7.1

  • SLES 11.4 and 15.2

What are hypervisors and versions are running in the environment?

  • IBM Power8 and Power7

  • VMware vCenter Server 6.5

What container orchestration is running in the environment?

  • Kubernetes

  • Red Hat OpenShift

Do you have budget allocated for this activity? Take note of the amount if you know.

  • Yes, we have $X to be used until next year or until the end of next year.

How are you going to use the discovery tool output to help you with your migration journey?

  • Performance: We expect to use the performance information to size the assets in the target environment.

  • TCO: With the inputs from performance, we want to be able to calculate the total cost of ownership (TCO).

  • Application: We expect to know what packages and versions are running on the servers.

  • Network: We expect to be able to see who the servers are talking to (inbound and outbound) and plan a smooth migration to the cloud.

Phase 2: Tool assessment

In the second phase, tool assessment, you already understand what you are looking for and why, so you can make better decisions during tool selection. To evaluate the options, you can compare your existing solutions with the tools that are available from AWS and AWS Partners.

It’s important to consider the following aspects, which can make a difference during the next phase, rollout planning:

  • License

  • SaaS or customer-deployed

  • Agentless, agent-based, or login-based

  • Supported operating systems

To ensure that a tool will provide the outcomes that you expect, we recommend asking the software providers for a demo of the tool.

Phase 3: Rollout planning

The rollout planning phase is often underestimated, because evaluation demos and testing on a lab server can appear simple compared with a full migration.

When rolling out the discovery tool, it’s important to have a well-defined process and know which teams need to be involved. It’s critical to have the environment mapped and set up as early as possible so that you can run the tool long enough to gain insights into your environment. You can then use those insights when planning the migration.

Your rollout strategy might depend on your tooling selection. You can use solutions such as shell scripting, PowerShell, AWS Systems Manager, Ansible, Chef, or any other device configuration tool. We recommend rolling out first on all your non-production servers. When you feel confident that the rollout has not impacted any systems, deploy the tool in the production environment.

Create clear and complete documentation that explains the following:

  • Prerequisites

  • Installation

  • Reinstallation

  • Uninstalling

  • Log files

  • Validation process

  • Known issues

  • Point of contact

Include any other technical information about the discovery tool that you have selected.

Phase 4: Analyzing the outcome

After the discovery tool is running and reporting as designed, a common question is, “How long should I keep my assets reporting to provide the right output?” The answer to that question is influenced by the following factors:

  • Peak usage: If your system has a peak usage only in the beginning of the month, you might need to wait up to 30 days to obtain the peak usage metrics.

  • Network mapping: If your application has a job scheduled to run one time a month, you might need to wait up to 30 days.

  • Scheduled activities: If you have a specific scheduled activity that runs one time every quarter, you might need to wait up to 3 months. Treat these as exceptions. You don’t need to wait 3 months for all your applications.

Normally, companies collect at least 2 weeks of data to plan migrations. For better data, we recommend collecting data for at least 4 weeks. That range works for most of the cases and application behaviors. However, be sure to plan the number of weeks based on your application needs.