Data collection - AWS Prescriptive Guidance

Data collection

Begin the project by discovering your environment. The level of detail in the data you collect depends on your business needs. If you need to support a business case or provide financial estimates for other purposes, start by collecting the necessary data for creating annual run rate and migration cost estimates. If you don’t require financial estimates, focus on the required application and infrastructure data. This data supplements the core data needed to create cost estimates and will be used in the analysis phase.

Understanding the dependencies between applications and infrastructure (that is, application to application, application to infrastructure) is critical to determining the impact of moving workloads. The amount of data required can also vary based on factors such as business impact or scope of impact if an application is unavailable, because migration typically requires a planned outage window. It’s rare to get all data collected, so use good judgment to decide when you have enough data to proceed to the next stage.

Decide what data needs to be gathered

You might require no data or complete datasets, depending on the use case for the data. For example, if you are exploring migration costs, you need nothing more than a high-level understanding of your on-premises environment (for example, the number of servers with a breakdown by operating system). The resulting estimates will closely correlate with the accuracy of the data inputs, so keep your output requirements in mind.

If you need to decide how and when applications are going to migrate, you will need a complete and accurate dataset that includes detailed documentation of the applications and infrastructure to be migrated. We strongly recommend that you use an automated application/infrastructure discovery tool to ensure completeness and accuracy. See Resources for a list of these.

Consider what the objective is and what your drivers are, and then determine the data that you will need to collect. A key consideration for deciding whether to collect specific data is how much time it will take to collect.

Decide how to gather the data

When you have determined what data must be collected, determine how to get it. Remember to consider how long and how much effort it will take to get the data.

Your primary decision at this point is whether to install a data collection tool to help you gather data rapidly. Unless you have a compelling reason not to use one, we recommend that you use a discovery tool because it can significantly accelerate discovery. Here are a few questions to ask that will help justify your decision:

  • Do my subject matter experts know the answers to the questions?

  • Do I have legacy workloads where people who know about those workloads are no longer with the organization?

  • What will the discovery tool collect? Does this align with the data I have decided to collect?

  • What data will I need to gather manually?

  • How long should I expect data collection to take? How long do I have?

  • What is the security review process to install a tool? Can we install agents to discover the workloads?

  • How long will procurement take? Can I shortcut this by using free tools or AWS Marketplace offerings?

  • How accurate does my dataset have to be? Can I take someone’s word for it or should I gather more accurate and precise empirical data?

The last question is a key decision that must be made by the leadership team: What is the risk tolerance for making a wrong decision? Wrong decisions happen when you have incomplete or inaccurate data.

When you have decided whether to use a discovery tool, you must define the processes for collating your data sources. Discovery tools are beneficial but they can’t give you everything you need. Understand what will and will not be provided by the tool. It generally takes two to four weeks to get good data. While you wait for the discovery tool to collect data, gather the supplemental information you will use in future phases of migration. Here are some examples of data to gather outside a discovery tool:

  • Who owns or supports the application?

  • What business units does this application support?

  • What is the relative importance (criticality or tier) of the application to the business?


If you aren’t careful, discovery can be an infinite resource drain. At some point, you must decide to move forward with incomplete data. It is nearly impossible to get 100 percent accurate and complete data during discovery. The goal for this phase is not complete accuracy but, rather, good enough data that can get you to the next phase, minimizing the churn you will experience during upcoming phases. After the minimum one to two months of discovery investment, the amount of new data you discover rapidly decreases.