Best practices
To enable subsequent machine learning (ML) or reinforcement learning (RL), it is crucial to follow best practices in various areas, including data ingestion, asset management, telemetry storage, and visualization.
Data ingestion plays a vital role in the project's success. It involves uploading data generated by edge assets to AWS or the cloud of your choice, allowing for cloud-scale interactions. To streamline the process and facilitate scalability, an edge-side component for automated onboarding of new sites should be implemented. This ensures that new assets can seamlessly integrate with the existing infrastructure as they come online.
Asset management is another critical aspect that needs
careful consideration. By mapping asset metadata to a standardized ontology such as the Brick ontology, you can gain a holistic view
of assets and their properties, hierarchies, and relationships. The following diagram shows an
example mapping that is adapted from the Brick ontology

Storing this metadata in a graph database such as Amazon Neptune
The telemetry store is responsible for storing the ingested
data in real time and employing lifecycle management to reduce costs and minimize risk. The
telemetry store uses both hot and cold storage mechanisms to enable efficient and reliable data
storage. Implementing a data catalog such as AWS Glue
To provide insights and enable informed decision-making, we recommend that you develop a
visualization component. This is a dashboard that enables
users to visualize the uploaded asset data, and provides a clear and intuitive representation of
the information collected. Presenting data in a user-friendly manner can help stakeholders to
easily grasp the current status of the energy optimization project and make data-driven
decisions. After you establish this data foundation, you can use RL to enable energy
optimization. For a sample implementation, see the GitHub repository Amazon Neptune and
AWS IoT SiteWise for industrial machine learning applications
External conditions play a crucial role in the RL environment. You should consider variables such as atmospheric pressure, constant air flow, supply temperature, supply relative humidity, zone temperature, zone relative humidity, outside air temperature, outside air relative humidity, cooling setpoint, and minimum outside air percentage. These conditions form the state representation and provide the necessary context for the RL agent to make decisions.
The RL solution should make certain assumptions, such as a constant airflow and constant supply air temperature or relative humidity, to simplify the problem. These assumptions help constrain the environment for the RL agent, and enable the agent to learn and optimize its actions faster.
The RL agent's actions are defined by the economizer enabling setpoints. These setpoints, such as the economizer maximum enabling temperature and the economizer maximum enabling enthalpy, determine the behavior of the system and its power-saving potential. The RL agent learns to select appropriate setpoints based on the observed state to maximize the power-saving rewards.
The reward function is a crucial aspect of RL. In this case, the reward is calculated based on the power-saving logic while maintaining human comfort. The RL agent aims to minimize power consumption, and the reward is determined by comparing the power consumption with and without the selected economizer enabling setpoints. By incentivizing power reduction, the RL agent learns to optimize its actions over time.
The following diagram shows an example of an energy optimization RL loop. For more
information about this workflow and sample code, see the GitHub repository Guidance for Monitoring and Optimizing Energy Usage on AWS

Developing an RL solution by following best practices involves striking a balance between exploration and exploitation. Techniques such as Epsilon-Greedy exploration or Thompson sampling help the agent use an appropriate number of iterations when training.
Careful RL algorithm selection, such as Q-learning or Deep Q Network (DQN), along with hyperparameter tuning, ensures optimal learning and convergence. Employing techniques such as experience replay can enhance the efficiency of the available samples and is useful when there is limited real-world experience for the agent. Target networks improve the stability of training by having the agent try multiple examples before reconsidering its approach. Overall, these practices facilitate effective RL solution development for maximizing rewards and optimizing performance.
In summary, developing an RL solution for a power-saving simulator requires considering external conditions, defining assumptions, selecting meaningful actions, and designing a suitable reward function. Best practices include proper exploration-exploitation trade-offs, algorithm selection, hyperparameter tuning, and employing stability-enhancing techniques such as experience replay and target networks. Cloud technologies provide cost efficiency, durability, and scalability for analytics and machine learning. Adhering to best practices in data ingestion, asset management, telemetry storage, visualization, and machine learning development enables seamless integration, efficient data handling, and valuable insights, leading to a successful project delivery.