AWS DeepRacer
Developer Guide

This is prerelease documentation for a service in preview release. It is subject to change.

Progressively Enhance Reward Functions for More Complex Situations

This section shows how to iteratively enhance and improve a reward function with a series of example functions that take the input parameters as described elsewhere:

Example 1: Follow the Track Center

In this example, we start with a reward function to keep the vehicle driving close to the center of a track.

The following reward function returns more reward when the vehicle is closer to the center of a track. The logic goes as follows:

  • If the vehicle's distance from the track center is less than 20% of the track width, the output reward is 10.

  • If the distance is greater than 20% and less than 50% of the track width, the reward is 3.

  • If the distance is greater than 50% and less than 80% of the track width, the reward is 1.

  • If the distance is greater than 80% of the track width, which is assumed to be crashed or off the track, the reward is 0.01.

In AWS DeepRacer, we can implement this function as follows:

def reward_function(on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, streering, track_width, waypoints, closest_waypoint): import math marker_1 = 0.2 * track_width marker_2 = 0.5 * track_width marker_3 = 0.8 * track_width if distance_from_center >= 0.0 and distance_from_center <= marker_1: reward = 10 elif distance_from_center <= marker_2: reward = 3 elif distance_from_center <= marker_3: reward = 1 else: reward = 1e-2 # likely crashed/ close to off track return float(reward)

The effect of this reward function is to keep the vehicle to drive as close to the track center as possible.

In addition to the discrete version, you could also employ a continuous one to return the reward of the following form:

reward = exp(-a * distance_from_center)

Here a is a constant and distance_from_center is between 0 and 1.

Example 2: Follow Center Line without Excessive Turns

In the following example function, we add rewards or penalties to steering to prevent the vehicle from turning away from the center line of the track.

def reward_function2(on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, streering, track_width, waypoints, closest_waypoint): import math marker_1 = 0.2 * track_width marker_2 = 0.5 * track_width marker_3 = 0.8 * track_width if distance_from_center >= 0.0 and distance_from_center <= marker_1: reward = 10 elif distance_from_center <= marker_2: reward = 3 elif distance_from_center <= marker_3: reward = 1 else: reward = 1e-2 # likely crashed/ close to off track # penalize reward if the vehicle is steering way too much ABS_STEERING_THRESHOLD = 0.3 if math.abs(steering) > ABS_STEERING_THRESHOLD: reward -= 0.5 * math.abs(steering) reward = max([0.0, reward]) return float(reward)

Example 3: Follow Center Line with Straight Orientation

In this example, we add rewards or penalties to vehicle's orientations to keep the vehicle's body straight while it drives along the track center.

def reward_function3(on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, streering, track_width, waypoints, closest_waypoint): import math marker_1 = 0.2 * track_width marker_2 = 0.5 * track_width marker_3 = 0.8 * track_width if distance_from_center >= 0.0 and distance_from_center <= marker_1: reward = 10 elif distance_from_center <= marker_2: reward = 3 elif distance_from_center <= marker_3: reward = 1 else: reward = 1e-2 # likely crashed/ close to off track waypoint_yaw = waypoints[closest_waypoint][-1] # penalize reward if orientation of the vehicle deviates way too much when compared to ideal orientation if math.abs(car_orientation - waypoint_yaw) >= math.radians(5): reward *= 0.5 return float(reward)

If you use an orientation to detect if a vehicle is to make turns, this reward function should help other vehicles recognize the intention more clearly when a vehicle in the front makes turns.