AWS DeepRacer 報酬関数の入力パラメータ

AWS DeepRacer 報酬関数は、ディクショナリオブジェクトを入力として受け取ります。


def reward_function(params) :
    
    reward = ...

    return float(reward)

params 辞書オブジェクトには、次のキーと値のペアが含まれています。


{
    "all_wheels_on_track": Boolean,        # flag to indicate if the agent is on the track
    "x": float,                            # agent's x-coordinate in meters
    "y": float,                            # agent's y-coordinate in meters
    "closest_objects": [int, int],         # zero-based indices of the two closest objects to the agent's current position of (x, y).
    "closest_waypoints": [int, int],       # indices of the two nearest waypoints.
    "distance_from_center": float,         # distance in meters from the track center 
    "is_crashed": Boolean,                 # Boolean flag to indicate whether the agent has crashed.
    "is_left_of_center": Boolean,          # Flag to indicate if the agent is on the left side to the track center or not. 
    "is_offtrack": Boolean,                # Boolean flag to indicate whether the agent has gone off track.
    "is_reversed": Boolean,                # flag to indicate if the agent is driving clockwise (True) or counter clockwise (False).
    "heading": float,                      # agent's yaw in degrees
    "objects_distance": [float, ],         # list of the objects' distances in meters between 0 and track_length in relation to the starting line.
    "objects_heading": [float, ],          # list of the objects' headings in degrees between -180 and 180.
    "objects_left_of_center": [Boolean, ], # list of Boolean flags indicating whether elements' objects are left of the center (True) or not (False).
    "objects_location": [(float, float),], # list of object locations [(x,y), ...].
    "objects_speed": [float, ],            # list of the objects' speeds in meters per second.
    "progress": float,                     # percentage of track completed
    "speed": float,                        # agent's speed in meters per second (m/s)
    "steering_angle": float,               # agent's steering angle in degrees
    "steps": int,                          # number steps completed
    "track_length": float,                 # track length in meters.
    "track_width": float,                  # width of the track
    "waypoints": [(float, float), ]        # list of (x,y) as milestones along the track center

}

入力パラメータに関するより詳細な技術リファレンスは以下のとおりです。

all_wheels_on_track

タイプ: Boolean

範囲: (True:False)

エージェントがトラック内にあるのかトラック外にあるのかを示す Boolean フラグ。ホイールのいずれかがトラックの境界線の外側にある場合は、トラック外 (False) です。すべてのホイールが 2 つのトラック境界の内側にある場合はトラック内 (True) です。次の図は、エージェントがトラック上にあることを示しています。

イメージ: の AWS DeepRacer 報酬関数入力パラメータall_wheels_on_track = True。

次の図は、エージェントがトラックから外れていることを示しています。

イメージ: の AWS DeepRacer 報酬関数入力パラメータall_wheels_on_track = False。

例: all_wheels_on_track パラメータを試用した報酬関数


def reward_function(params):
    #############################################################################
    '''
    Example of using all_wheels_on_track and speed
    '''

    # Read input variables
    all_wheels_on_track = params['all_wheels_on_track']
    speed = params['speed']

    # Set the speed threshold based your action space
    SPEED_THRESHOLD = 1.0

    if not all_wheels_on_track:
        # Penalize if the car goes off track
        reward = 1e-3
    elif speed < SPEED_THRESHOLD:
        # Penalize if the car goes too slow
        reward = 0.5
    else:
        # High reward if the car stays on track and goes fast
        reward = 1.0

    return float(reward)

closest_waypoints

タイプ: [int, int]

範囲: [(0:Max-1),(1:Max-1)]

(x, y) のエージェントの現在位置に最も近い 2 つが隣接する waypoint のゼロベースのインデックス。距離は、エージェントの中心からのユークリッド距離によって測定されます。最初の要素は、エージェントの背後に最も近いウェイポイントを指し、2 番目の要素は、エージェントの前面にある最も近いウェイポイントを指します。Max は、ウェイポイントリストの長さです。ウェイポイントで示している図では、closest_waypoints は [16, 17] になります。

例: closest_waypoints パラメータを使用する報酬関数。

次の例の報酬関数は、waypoints とclosest_waypoints、および heading を使用して即時報酬を計算する方法を示しています。

AWS は、数学、ランダム NumPy SciPy、Shapely のライブラリ DeepRacer をサポートしています。1 つを使用するには、関数定義の上に、import supported library、インポートステートメントを追加します: def function_name(parameters)。


# Place import statement outside of function (supported libraries: math, random, numpy, scipy, and shapely)
# Example imports of available libraries
#
# import math
# import random
# import numpy
# import scipy
# import shapely

import math

def reward_function(params):
    ###############################################################################
    '''
    Example of using waypoints and heading to make the car point in the right direction
    '''

    # Read input variables
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    heading = params['heading']

    # Initialize the reward with typical value
    reward = 1.0

    # Calculate the direction of the center line based on the closest waypoints
    next_point = waypoints[closest_waypoints[1]]
    prev_point = waypoints[closest_waypoints[0]]

    # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians
    track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0])
    # Convert to degree
    track_direction = math.degrees(track_direction)

    # Calculate the difference between the track direction and the heading direction of the car
    direction_diff = abs(track_direction - heading)
    if direction_diff > 180:
        direction_diff = 360 - direction_diff

    # Penalize the reward if the difference is too large
    DIRECTION_THRESHOLD = 10.0
    if direction_diff > DIRECTION_THRESHOLD:
        reward *= 0.5

    return float(reward)

closest_objects

タイプ: [int, int]

範囲: [(0:len(objects_location)-1), (0:len(objects_location)-1)]

エージェントの現在の位置（x、y）に最も近い 2 つのオブジェクトのゼロから始まるインデックス。最初のインデックスは、エージェントの背後にある最も近いオブジェクトを参照し、2 番目のインデックスは、エージェントの前にある最も近いオブジェクトを参照します。オブジェクトが 1 つしかない場合、両方のインデックスは 0 です。

distance_from_center

タイプ: float

範囲: 0:~track_width/2

エージェントの中心とトラックの中心との間のメートル単位の変位。観察可能な最大変位は、エージェントのいずれかの車輪がトラックの境界線の外側にあるときに発生し、トラックの境界線の幅に応じて、track_width の半分よりわずかに小さいまたは大きい場合があります。

イメージ: の AWS DeepRacer 報酬関数入力パラメータdistance_from_center。

例: distance_from_center パラメータを使用する報酬関数


def reward_function(params):
    #################################################################################
    '''
    Example of using distance from the center
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Penalize if the car is too far away from the center
    marker_1 = 0.1 * track_width
    marker_2 = 0.5 * track_width

    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return float(reward)

heading

タイプ: float

範囲: -180:+180

座標系の x 軸に対するエージェントの進行方向（度単位）。

イメージ: の AWS DeepRacer 報酬関数入力パラメータheading。

例: heading パラメータを使用する報酬関数

詳細については、「closest_waypoints」を参照してください。

is_crashed

タイプ: Boolean

範囲: (True:False)

エージェントが終了ステータスとして別のオブジェクトにクラッシュしたか (True)、否か (False) を示すブール型フラグ。

is_left_of_center

タイプ: Boolean

範囲: [True : False]

エージェントがトラックの中心より左側 (True) にあるのか右側 (False) にあるのかを示す Boolean フラグ。

is_offtrack

タイプ: Boolean

範囲: (True:False)

エージェントが終了ステータスとしてトラック外 (True) であるのかどうか (False) を示すブール型フラグ。

is_reversed

タイプ: Boolean

範囲: [True:False]

エージェントが時計回り (True) であるのか反時計回り (False) であるのかを示すブール型フラグ。

これは、エピソードごとに方向変更を有効にする場合に使用されます。

objects_distance

タイプ: [float, … ]

範囲: [(0:track_length), … ]

開始ラインに対する環境内のオブジェクト間の距離のリスト。i 番目の要素は、i 番目のオブジェクトと、トラックの中心線に沿った開始線間の距離をメートルで測定します。

注記

abs | (var1) - (var2)| = how close the car is to an object, WHEN var1 = ["objects_distance"][index] and var2 = params["progress"]*params["track_length"]

車両の前面に最も近いオブジェクトと車両の背後に最も近いオブジェクトのインデックスを取得するには、"closest_objects" パラメータを使用します。

objects_heading

タイプ: [float, … ]

範囲: [(-180:180), … ]

オブジェクトの見出しのリスト（度単位）。i^番目の要素は、i^番目のオブジェクトの見出しを測定します。静止オブジェクトの場合、見出しは 0 です。ボット車両の場合、対応する要素の値は車両の見出し角度です。

objects_left_of_center

タイプ: [Boolean, … ]

範囲: [True|False, … ]

ブール型フラグのリスト。i^番目の要素の値は、i^番目のオブジェクトがトラックセンターの左側 (True) か右側 (False) かを示します。

objects_location

タイプ: [(x,y), … ]

範囲: [(0:N,0:N), … ]

すべてのオブジェクトの場所のリスト。各場所は (x, y) のタプルです。

リストのサイズは、トラック上のオブジェクトの数と同じです。オブジェクトは、固定障害物、移動ボット車両である可能性があることに注意してください。

objects_speed

タイプ: [float, … ]

範囲: [(0:12.0), … ]

トラック上のオブジェクトの速度（メートル/秒）のリスト。静止オブジェクトの場合、速度は 0 です。ボット車両の場合、値はトレーニングで設定した速度です。

progress

タイプ: float

範囲: 0:100

トラック完走の割合。

例: progress パラメータを使用する報酬関数

詳細については、「ステップ」を参照してください。

速度

タイプ: float

範囲: 0.0:5.0

エージェントの観測速度（メートル/秒）。

例: speed パラメータを使用する報酬関数

詳細については、「all_wheels_on_track」を参照してください。

steering_angle

タイプ: float

範囲: -30:30

エージェントの中心線からの前輪のステアリング角（度単位）。負の記号 (-) は右へのステアリングを意味し、正の (+) 記号は左へのステアリングを意味します。次の図に示すように、エージェントの中心線はトラックの中心線と必ずしも平行ではありません。

イメージ: の AWS DeepRacer 報酬関数入力パラメータsteering_angle。

例: steering_angle パラメータを使用する報酬関数


def reward_function(params):
    '''
    Example of using steering angle
    '''

    # Read input variable
    abs_steering = abs(params['steering_angle']) # We don't care whether it is left or right steering

    # Initialize the reward with typical value
    reward = 1.0

    # Penalize if car steer too much to prevent zigzag
    ABS_STEERING_THRESHOLD = 20.0
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

steps

タイプ: int

範囲: 0:N_step

完了したステップ数。ステップは、現在のポリシーに従ってエージェントがとるアクションに対応します。

例: steps パラメータを使用する報酬関数


def reward_function(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']

    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return float(reward)

track_length

タイプ: float

範囲: [0:L_max]

トラックの長さ（メートル単位）。L_max is track-dependent.

track_width

タイプ: float

範囲: 0:D_track

トラックの幅 (メートル)。

イメージ: の AWS DeepRacer 報酬関数入力パラメータtrack_width。

例: track_width パラメータを使用する報酬関数


def reward_function(params):
    #############################################################################
    '''
    Example of using track width
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate the distance from each border
    distance_from_border = 0.5 * track_width - distance_from_center

    # Reward higher if the car stays inside the track borders
    if distance_from_border >= 0.05:
        reward = 1.0
    else:
        reward = 1e-3 # Low reward if too close to the border or goes off the track

    return float(reward)

x、y

タイプ: float

範囲: 0:N

トラックを含むシミュレーション環境の x 軸と y 軸に沿ったエージェント中心の位置（メートル単位）。原点は、シミュレーション環境の左下隅にあります。

ウェイポイント

タイプ: [float, float] の list

範囲: [[x_w,0,y_w,0] … [x_w,Max-1, y_w,Max-1]]

トラックの中心に沿ったトラック依存 Max マイルストーンの順序付きリスト。各マイルストーンは、(x _w,i、y _w,i) の座標で表されます。ループされたトラックの場合、最初と最後のウェイポイントは同じです。直線のトラックなどループされないトラックの場合、最初と最後のウェイポイントは異なります。

イメージ: の AWS DeepRacer 報酬関数入力パラメータwaypoints。

例 waypoints パラメータを使用する報酬関数

詳細については、「closest_waypoints」を参照してください。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

報酬関数リファレンス

報酬関数の例