範例 1：遵循時間試驗的中心線範例 2：停留在時間試驗的兩個邊界內範例 3：在時間試驗中防止 Zig-zag 範例 4：停留在一個車道中，而不會撞到靜止的障礙物或移動的車輛

AWS DeepRacer 獎勵函數範例

下列列出 AWS DeepRacer 獎勵函數的一些範例。

主題

範例 1：遵循時間試驗的中心線
範例 2：停留在時間試驗的兩個邊界內
範例 3：在時間試驗中防止 Zig-zag
範例 4：停留在一個車道中，而不會撞到靜止的障礙物或移動的車輛

範例 1：遵循時間試驗的中心線

此範例會判斷代理程式與中心線距離多遠，並在代理程式與賽道中心較接近時給予較高的獎勵，鼓勵代理程式緊跟著中心線。


def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward

範例 2：停留在時間試驗的兩個邊界內

如果客服人員停留在邊界內，此範例只會提供高獎勵，並讓客服人員找出完成圈數的最佳路徑。編寫程式和理解非常簡單，但可能需要更長的時間才能收斂。


def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''
    
    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    
    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and 
    # the car is somewhere in between the track borders 
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return reward

範例 3：在時間試驗中防止 Zig-zag

此範例會鼓勵代理程式依循中心線，但會在其轉向過大時以較低的獎勵進行懲罰，有助於防止蛇行行為。代理程式會學習在模擬器中順利駕駛，並在部署到實體車輛時可能保持相同的行為。


def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 

    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

範例 4：停留在一個車道中，而不會撞到靜止的障礙物或移動的車輛

此獎勵函數獎勵代理程式停留在賽道邊界內，並懲罰代理程式太靠近前方的物件。代理程式可以變換車道來避免衝撞。總獎勵是獎勵和懲罰的加權總和。此範例為懲罰提供更多權重，以避免當機。使用不同的平均權重進行實驗，以訓練不同的行為結果。


import math
def reward_function(params):
    '''
    Example of rewarding the agent to stay inside two borders
    and penalizing getting too close to the objects in front
    '''
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    objects_location = params['objects_location']
    agent_x = params['x']
    agent_y = params['y']
    _, next_object_index = params['closest_objects']
    objects_left_of_center = params['objects_left_of_center']
    is_left_of_center = params['is_left_of_center']
    # Initialize reward with a small number but not zero
    # because zero means off-track or crashed
    reward = 1e-3
    # Reward if the agent stays inside the two borders of the track
    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
        reward_lane = 1.0
    else:
        reward_lane = 1e-3
    # Penalize if the agent is too close to the next object
    reward_avoid = 1.0
    # Distance to the next object
    next_object_loc = objects_location[next_object_index]
    distance_closest_object = math.sqrt((agent_x - next_object_loc[0])**2 + (agent_y - next_object_loc[1])**2)
    # Decide if the agent and the next object is on the same lane
    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center
    if is_same_lane:
        if 0.5 <= distance_closest_object < 0.8:
            reward_avoid *= 0.5
        elif 0.3 <= distance_closest_object < 0.5:
            reward_avoid *= 0.2
        elif distance_closest_object < 0.3:
            reward_avoid = 1e-3  # Likely crashed
    # Calculate reward by putting different weights on
    # the two aspects above
    reward += 1.0 * reward_lane + 4.0 * reward_avoid
    return reward

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

獎勵函數輸入參數