AWS DeepRacer 보상 함수의 입력 파라미터

AWS DeepRacer 보상 함수는 사전 객체를 입력으로 사용합니다.


def reward_function(params) :
    
    reward = ...

    return float(reward)

params 딕셔너리 객체에는 다음과 같은 키-값 페어가 저장됩니다.


{
    "all_wheels_on_track": Boolean,        # flag to indicate if the agent is on the track
    "x": float,                            # agent's x-coordinate in meters
    "y": float,                            # agent's y-coordinate in meters
    "closest_objects": [int, int],         # zero-based indices of the two closest objects to the agent's current position of (x, y).
    "closest_waypoints": [int, int],       # indices of the two nearest waypoints.
    "distance_from_center": float,         # distance in meters from the track center 
    "is_crashed": Boolean,                 # Boolean flag to indicate whether the agent has crashed.
    "is_left_of_center": Boolean,          # Flag to indicate if the agent is on the left side to the track center or not. 
    "is_offtrack": Boolean,                # Boolean flag to indicate whether the agent has gone off track.
    "is_reversed": Boolean,                # flag to indicate if the agent is driving clockwise (True) or counter clockwise (False).
    "heading": float,                      # agent's yaw in degrees
    "objects_distance": [float, ],         # list of the objects' distances in meters between 0 and track_length in relation to the starting line.
    "objects_heading": [float, ],          # list of the objects' headings in degrees between -180 and 180.
    "objects_left_of_center": [Boolean, ], # list of Boolean flags indicating whether elements' objects are left of the center (True) or not (False).
    "objects_location": [(float, float),], # list of object locations [(x,y), ...].
    "objects_speed": [float, ],            # list of the objects' speeds in meters per second.
    "progress": float,                     # percentage of track completed
    "speed": float,                        # agent's speed in meters per second (m/s)
    "steering_angle": float,               # agent's steering angle in degrees
    "steps": int,                          # number steps completed
    "track_length": float,                 # track length in meters.
    "track_width": float,                  # width of the track
    "waypoints": [(float, float), ]        # list of (x,y) as milestones along the track center

}

입력 파라미터에 대해 자세한 기술 참조는 다음과 같습니다.

all_wheels_on_track

유형: Boolean

범위: (True:False)

에이전트의 트랙 주행 또는 트랙 이탈을 나타내는 Boolean 플래그입니다. 바퀴 하나라도 트랙 경계를 벗어나면 트랙 이탈(False)입니다. 바퀴가 모두 트랙 경계 사이에 있으면 트랙 주행(True)입니다. 다음 그림은 에이전트가 트랙을 따라 주행하는 것을 나타냅니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터all_wheels_on_track = True.

다음 그림은 에이전트가 트랙을 이탈한 것을 나타냅니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터all_wheels_on_track = False.

예제: all_wheels_on_track 파라미터를 사용하는 보상 함수


def reward_function(params):
    #############################################################################
    '''
    Example of using all_wheels_on_track and speed
    '''

    # Read input variables
    all_wheels_on_track = params['all_wheels_on_track']
    speed = params['speed']

    # Set the speed threshold based your action space
    SPEED_THRESHOLD = 1.0

    if not all_wheels_on_track:
        # Penalize if the car goes off track
        reward = 1e-3
    elif speed < SPEED_THRESHOLD:
        # Penalize if the car goes too slow
        reward = 0.5
    else:
        # High reward if the car stays on track and goes fast
        reward = 1.0

    return float(reward)

closest_waypoints

유형: [int, int]

범위: [(0:Max-1),(1:Max-1)]

에이전트의 현재 위치인 (x, y)에 가장 가깝게 인접한 두 waypoint의 제로 기반 인덱스입니다. 거리는 에이전트 중앙에서 유클리트(Eudlidean) 거리로 측정됩니다. 첫 번째 요소는 에이전트 뒤에서 가장 가까운 중간 지점을 나타내고, 두 번째 요소는 에이전트 앞에서 가장 가까운 중간 지점을 나타냅니다. Max는 중간 지점 목록의 길이입니다. 중간 지점에서 나타난 그림에서 closest_waypoints는 [16, 17]입니다.

예: closest_waypoints 파라미터를 사용하는 보상 함수

다음 보상 함수 예제는 waypoints, closest_waypoints 및 heading을 사용해 즉각적인 보상을 계산하는 방법을 나타낸 것입니다.

DeepRacer AWS는 수학, 랜덤 NumPy SciPy, 셰이플리 등의 라이브러리를 지원합니다. 이를 사용하려면 함수 정의 def function_name(parameters) 위에 가져오기 문 import supported library을 추가하십시오.


# Place import statement outside of function (supported libraries: math, random, numpy, scipy, and shapely)
# Example imports of available libraries
#
# import math
# import random
# import numpy
# import scipy
# import shapely

import math

def reward_function(params):
    ###############################################################################
    '''
    Example of using waypoints and heading to make the car point in the right direction
    '''

    # Read input variables
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    heading = params['heading']

    # Initialize the reward with typical value
    reward = 1.0

    # Calculate the direction of the center line based on the closest waypoints
    next_point = waypoints[closest_waypoints[1]]
    prev_point = waypoints[closest_waypoints[0]]

    # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians
    track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0])
    # Convert to degree
    track_direction = math.degrees(track_direction)

    # Calculate the difference between the track direction and the heading direction of the car
    direction_diff = abs(track_direction - heading)
    if direction_diff > 180:
        direction_diff = 360 - direction_diff

    # Penalize the reward if the difference is too large
    DIRECTION_THRESHOLD = 10.0
    if direction_diff > DIRECTION_THRESHOLD:
        reward *= 0.5

    return float(reward)

closest_objects

유형: [int, int]

범위: [(0:len(objects_location)-1), (0:len(objects_location)-1)]

에이전트의 현재 위치(x, y)에 가장 가까운 두 객체의 인덱스(0부터 시작)입니다. 첫 번째 인덱스는 에이전트 뒤에서 가장 가까운 객체를 참조하고 두 번째 인덱스는 에이전트 앞에서 가장 가까운 객체를 참조합니다. 객체가 하나만 있는 경우 두 인덱스는 모두 0이 됩니다.

distance_from_center

유형: float

범위: 0:~track_width/2

에이전트 중앙과 트랙 중앙 사이의 변위(미터)입니다. 에이전트의 바퀴 중 하나라도 트랙 경계를 벗어났을 때 최대 변위가 관측될 수 있으며, 이때 최대 변위는 트랙 경계의 너비에 따라 다르지만 track_width의 절반보다 약간 작거나 클 수 있습니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터distance_from_center.

예: distance_from_center 파라미터를 사용하는 보상 함수


def reward_function(params):
    #################################################################################
    '''
    Example of using distance from the center
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Penalize if the car is too far away from the center
    marker_1 = 0.1 * track_width
    marker_2 = 0.5 * track_width

    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return float(reward)

heading

유형: float

범위: -180:+180

좌표계의 x축에 대한 에이전트 진행 방향(각도)입니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터heading.

예: heading 파라미터를 사용하는 보상 함수

자세한 정보는 closest_waypoints을 참조하세요.

is_crashed

유형: Boolean

범위: (True:False)

에이전트가 다른 객체와 충돌했는지(True) 또는 충돌하지 않았는지(False)를 종료 상태로 나타내는 부울 플래그입니다.

is_left_of_center

유형: Boolean

범위: [True : False]

에이전트가 트랙 중앙에서 왼쪽에 있는지(True), 혹은 오른쪽에 있는지(False) 나타내는 Boolean 플래그입니다.

is_offtrack

유형: Boolean

범위: (True:False)

에이전트가 트랙을 벗어났는지(True) 또는 벗어나지 않았는지(False)를 나타내는 부울 플래그입니다.

is_reversed

유형: Boolean

범위: [True:False]

에이전트가 시계 방향(True) 또는 시계 반대 방향(False)으로 주행하는지 나타내는 부울 플래그입니다.

각 에피소드에 대한 방향 변경을 활성화할 때 사용됩니다.

objects_distance

유형: [float, … ]

범위: [(0:track_length), … ]

시작선을 기준으로 환경 내 객체 간 거리 목록입니다. i번째 요소는 트랙 중앙선을 따라 i번째 객체와 시작선 사이의 거리(미터)를 측정합니다.

참고

abs | (var1) - (var2)| = how close the car is to an object, WHEN var1 = ["objects_distance"][index] and var2 = params["progress"]*params["track_length"]

차량 앞에서 가장 가까운 객체와 차량 뒤에서 가장 가까운 객체의 인덱스를 가져오려면 “closest_object” 파라미터를 사용합니다.

objects_heading

유형: [float, … ]

범위: [(-180:180), … ]

객체의 방향(도)의 목록입니다. i번째 요소는 i번째 객체의 방향을 측정합니다. 정지 객체의 경우 방향은 0입니다. 로봇 차량의 경우 해당 요소의 값은 차량의 방향 각도입니다.

objects_left_of_center

유형: [Boolean, … ]

범위: [True|False, … ]

부울 플래그 목록입니다. i번째 요소 값은 i번째 객체가 트랙 중심의 왼쪽(True) 또는 오른쪽(False)에 있는지를 나타냅니다.

objects_location

유형: [(x,y), … ]

범위: [(0:N,0:N), … ]

모든 객체 위치의 목록으로, 각 위치는 (x, y)의 튜플입니다.

목록 크기는 트랙 위 객체의 수와 같습니다. 객체는 정지 장애물, 이동 로봇 차량이 될 수 있습니다.

objects_speed

유형: [float, … ]

범위: [(0:12.0), … ]

트랙 위 객체의 속도(m/s) 목록입니다. 정지 객체의 경우 속도는 0입니다. 봇 차량의 경우 이 값은 훈련 시 설정한 속도입니다.

progress

유형: float

범위: 0:100

주행한 트랙의 비율입니다.

예: progress 파라미터를 사용하는 보상 함수

자세한 내용은 단계 항목을 참조하십시오.

속도

유형: float

범위: 0.0:5.0

관측된 에이전트 속도(m/s)입니다.

예: speed 파라미터를 사용하는 보상 함수

자세한 내용은 all_wheels_on_track 항목을 참조하십시오.

steering_angle

유형: float

범위: -30:30

에이전트 중앙선에 대한 전륜 조향 각도입니다. 음의 기호(-)는 오른쪽 조향을, 그리고 양의 기호(+)는 왼쪽 조향을 의미합니다. 다음 그림에서 알 수 있듯이 에이전트 중앙선과 트랙 중앙선이 반드시 평행을 이루지는 않습니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터steering_angle.

예: steering_angle 파라미터를 사용하는 보상 함수


def reward_function(params):
    '''
    Example of using steering angle
    '''

    # Read input variable
    abs_steering = abs(params['steering_angle']) # We don't care whether it is left or right steering

    # Initialize the reward with typical value
    reward = 1.0

    # Penalize if car steer too much to prevent zigzag
    ABS_STEERING_THRESHOLD = 20.0
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

steps

유형: int

범위: 0:N_step

완료한 단계 수입니다. 단계란 현재 정책에 따리 에이전트가 취하는 행동을 말합니다.

예: steps 파라미터를 사용하는 보상 함수


def reward_function(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']

    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return float(reward)

track_length

유형: float

범위: [0:L_max]

트랙 길이(미터)입니다. L_max is track-dependent.

track_width

유형: float

범위: 0:D_track

트랙 너비(미터)입니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터track_width.

예: track_width 파라미터를 사용하는 보상 함수


def reward_function(params):
    #############################################################################
    '''
    Example of using track width
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate the distance from each border
    distance_from_border = 0.5 * track_width - distance_from_center

    # Reward higher if the car stays inside the track borders
    if distance_from_border >= 0.05:
        reward = 1.0
    else:
        reward = 1e-3 # Low reward if too close to the border or goes off the track

    return float(reward)

x, y

유형: float

범위: 0:N

트랙이 포함된 시뮬레이션 환경에서 x축과 y축에 따른 에이전트 중앙의 위치(미터)입니다. 원점은 시뮬레이션 환경에서 왼쪽 하단 모퉁이입니다.

중간 지점

유형: [float, float] list

범위: [[x_w,0,y_w,0] … [x_w,Max-1, y_w,Max-1]]

트랙 중앙을 따라 순서대로 나열된 트랙에 의존하는 Max 이정표 목록입니다. 각 이정표는 (x_w,i, y_w,i) 좌표로 알 수 있습니다. 순환 트랙의 경우, 첫 번째와 마지막 중간 지점은 동일합니다. 직선 또는 다른 비순환 트랙의 경우, 첫 번째와 마지막 중간 지점은 다릅니다.

이미지: AWS DeepRacer 보상 함수 입력 파라미터waypoints.

예: waypoints 파라미터를 사용하는 보상 함수

자세한 내용은 closest_waypoints을(를) 참조하세요.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

보상 함수 참조

보상 함수 예제