FAQs about using machine learning to forecast new product demand - AWS Prescriptive Guidance

FAQs about using machine learning to forecast new product demand

The following are frequently asked questions related to implementing an ML model that forecasts demand for new product introductions.

Who should I mobilize to start the process?

An organization's readiness is directly influenced by how much support you have from upper management. We recommend that you get approval from managers in the data science or analytics department, supply chain, marketing, and IT. Request support from other stakeholders and leaders as appropriate for your organization.

What kind of team should I assemble?

To successfully deliver the initiative and produce measurable outcomes, assemble a team that includes:

  • Data scientists for model development

  • Data engineers for data collection and ingestion

  • Machine learning engineers for model deployment and a self-service dashboard

  • Subject matter experts for domain expertise

What historical data do I need and how much?

Consider acquiring the following data:

  • Sales data for all similar products, from product launch to discontinuation.

  • Metadata that describes the product features and attributes. Examples of these attributes for CE products might be Bluetooth capability, wireless features, USB type, and color.

  • Relevant time-series data that is related to the sales data, such as marketing data, holiday data, review data, and rating data.

    Note

    It is beneficial if you can extend the relevant time-series data into the forecasting horizon for model inference. For example, if the related time-series data is holidays, you can extend the time-series data for holidays into the future because you know the holidays in advance.

When should I start generating a demand forecast for a new product?

This is a business decision that each organization needs to make. Ideally, an organization should use the forecast to meet the demand for the new product. It is recommended that you generate a weekly or monthly NPI demand forecast before you start manufacturing the new product. The forecast helps you properly estimate parts and labor.

What third-party data should I collect?

You can consider adding the following third-party data to get a more accurate forecast: consumer index, cost of living proxies, and competitor sales history. This third-party data would be considered related time-series data. Consider getting this data for the same time period as your sales data and at the same periodicity (such as daily or weekly).

What is the minimum infrastructure that I need?

At a minimum, the infrastructure should support the following:

  • Data ingestion pipelines, where data is collected either in batch or through streaming modes

  • A preprocessing ETL pipeline that extracts and transforms the raw data into standardized input formats for ML modeling

  • A development environment for model development, experimentation, and validation

  • A continuous integration and continuous deployment (CI/CD) pipeline that pushes the ML model into production

  • Mechanisms for model registry, monitoring, and retraining

  • A security layer that encrypts data in transit and data at rest and provides fine-grained access control

How do I validate that my data-driven approach is effective? What are the KPIs?

Every data science initiative or data-driven solution needs to be validated against a set of key performance indicators (KPIs). These KPIs can be a measure of how close the model's forecast is to the actual demand. You can generate this metric for different time periods, such as forecasts that are 1 week or 1 month in the future. You can also directly measure how many parts were over-ordered or under-ordered, based on the forecast generated by the model. The stakeholders and upper management should carefully craft a set of KPIs that track the model performance. Use those KPIs to determine if the ROI is meeting expectations.

How frequently should I generate the forecasts?

The forecast frequency depends on two factors. How tightly connected do you want the forecast to be to the available time-series datasets? How variable is the data from the related time-series datasets? In general, frequently generating forecasts can help your organization properly prepare to meet demand for the new product.

How do I enable self-service?

As capacity grows, the organization should develop a self-serve infrastructure that automates data ingestion, preprocessing, and the model training pipeline for forecast generation. The ML model results and impact should be measured and published to a dashboard for on-demand access.

How does AWS pricing work?

For more information, see AWS Pricing.