AWS DeepRacer
Developer Guide

This is prerelease documentation for a service in preview release. It is subject to change.

Create a Reward Function

For a reinforcement learning model, the reward function prescribes an immediate reward or penalty when the agent takes an action in a given state. Its purpose is to encourage the agent to do more to help accomplishing the agent's goals and, in the meantime, to do less (or none) to prevent the agent from accomplishing its goals. It amounts to the environment's providing feedback to agent actions, affecting the agent behavior and has impacts on the training performance. When using the AWS DeepRacer console to train model with a supported framework, the reward function is the only application-specific part and depends on your input.

Constructing a reward function is like creating an incentive plan. If not carefully considered, it can lead to unintended consequences of opposite effect. This is possible because the reward function is local in time, but the final tasks depend on expected rewards from future. Real-world behaviors are rarely representable by linear functions and short-term incentives are not guaranteed to lead to long-term rewards. A good practice to create a reward a function is to start with a simple one that covers basic scenarios. And then iteratively enhance it to handle more actions, until all the behaviors are considered.

For example, to train an AWS DeepRacer agent to drive autonomously on a well-marked track, we create the reward function with the following signature:

def reward_function(self, on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, streering, track_width, waypoints, closest_waypoint):

where the input parameters, as described in the following table, represent the state in which an action is to be taken and the output is a real value in the range of [-100000.0, 100000.0].

Note

When calculating the reward, keep the result within the output value range and not to return an exact zero (0) value.

Input Parameters of AWS DeepRacer Reward Functions

Parameter Type Range Description
on_track boolean True|False The vehicle is off track (False) if the front of the vehicle is outside of the (white) track lines, otherwise, it's on track (True).
x float [0,inf] Location of the vehicle along the x-axis.
y float [0,inf] Location of the vehicle along the y-axis.
distance_from_center float [0,track_width/2] Displacement from the center line of the track as defined by waypoints.
car_orientation float [-π, π] Orientation of the vehicle around its z-axis with respect to the x-axis in radians. It's also known as yaw of the vehicle.
progress float [0,1] Percentage of track completed.
steps int [0,n] Number of steps completed. One step corresponds to an inferred action taken by the vehicle.
throttle float [0,1] Vehicle's speed. 0 indicates stopped and 1 means at the maximum speed.
steering float [-1,1] Steering position. -1 means right and 1 means left.
track_width float [0,inf] Track width.
waypoint (float,float) (xw,yw) A coordinate of (xw, yw) describing a central point on the track.
waypoints list [(xw,1,yw,1), (xw,2, yw,2), …] An ordered list of waypoints following the vehicle's progress along the track.
closest_waypoint int [0, number of waypoints -1]

The zero-based index of the closest waypoint relative to the vehicle's x and y positions as measured by the Euclidean distance:

argmini(..., (xw,i-x)2+(y-yw,i-y)2,...)

The closest waypoint can be in front of the vehicle or behind it.

We can start building the reward function by first considering the most basic situation, namely, driving on a straight track from start to finish without getting off the track. In this scenario, the reward function logic depends only on_track and progress. As a trial, you could start with the following logic:

def reward_function(self, on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, streering, track_width, waypoints, closest_waypoint): if not on_track: reward = -1 else if progress == 1 : reward = 10 return reward

This logic penalizes the agent when it drives itself off the track while rewards the agent when it drives to the finishing line. It's reasonable for achieving the stated goal. However, the agent will roam freely between the starting point and the finishing line, including driving backwards on the track. This means that not only the training could take a long time to complete, but also the trained model would lead to a less efficient driving when deployed to a running vehicle.

In practice, an agent learns more effectively if it can do so bit by bit throughout the course of training. This implies that a reward function should give out smaller rewards step by step along the track. For the agent to drive on the straight track, we can improve the reward function as follows:

def reward_function(self, on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, streering, track_width, waypoints, closest_waypoint): if not on_track: reward = -1 else: reward = progress return reward

With this function, the agent gets more reward the closer it reaches the finishing line. This should reduce or eliminate unproductive trials of driving backwards. In general, we want the reward function to distribute the reward more evenly over the action space. Creating an effective reward function can be a challenging undertaking. You should start with a simple one and progressively enhance or improve the function, with systematic experimentation, to become more robust and efficient.