RL Environments in Amazon SageMaker
Amazon SageMaker RL uses environments to mimic real-world scenarios. Given the current state of the environment and an action taken by the agent or agents, the simulator processes the impact of the action, and returns the next state and a reward. Simulators are useful in cases where it is not safe to train an agent in the real world (for example, flying a drone) or if the RL algorithm takes a long time to converge (for example, when playing chess).
The following diagram shows an example of the interactions with a simulator for a car racing game.
The simulation environment consists of an agent and a simulator. Here, a convolutional
neural network (CNN) consumes images from the simulator and generates actions to control
the game controller. With multiple simulations, this environment generates training data
of the form state_t
, action
, state_t+1
, and
reward_t+1
. Defining the reward is not trivial and impacts the RL model
quality. We want to provide a few examples of reward functions, but would like to make
it user-configurable.
Topics
Use OpenAI Gym Interface for Environments in SageMaker RL
To use OpenAI Gym environments in SageMaker RL, use the following API elements. For
more information about OpenAI Gym, see Gym Documentation
-
env.action_space
—Defines the actions the agent can take, specifies whether each action is continuous or discrete, and specifies the minimum and maximum if the action is continuous. -
env.observation_space
—Defines the observations the agent receives from the environment, as well as minimum and maximum for continuous observations. -
env.reset()
—Initializes a training episode. Thereset()
function returns the initial state of the environment, and the agent uses the initial state to take its first action. The action is then sent tostep()
repeatedly until the episode reaches a terminal state. Whenstep()
returnsdone = True
, the episode ends. The RL toolkit re-initializes the environment by callingreset()
. -
step()
—Takes the agent action as input and outputs the next state of the environment, the reward, whether the episode has terminated, and aninfo
dictionary to communicate debugging information. It is the responsibility of the environment to validate the inputs. -
env.render()
—Used for environments that have visualization. The RL toolkit calls this function to capture visualizations of the environment after each call to thestep()
function.
Use Open-Source Environments
You can use open-source environments, such as EnergyPlus and RoboSchool, in SageMaker
RL by building your own container. For more information about EnergyPlus, see https://energyplus.net/
Use Commercial Environments
You can use commercial environments, such as MATLAB and Simulink, in SageMaker RL by building your own container. You need to manage your own licenses.