활성화 체크포인트

활성화 체크포인팅은 특정 레이어의 활성화를 지우고 역방향 패스 중에 다시 계산하여 메모리 사용량을 줄이는 기법입니다. 사실상 이렇게 하면 추가 계산 시간을 절약하여 메모리 사용량을 줄일 수 있습니다. 모듈이 체크포인트로 지정된 경우 포워드 패스가 끝나면 모듈에 대한 초기 입력과 모듈의 최종 출력만 메모리에 남습니다. PyTorch 포워드 패스 중에 해당 모듈 내에서 계산의 일부인 모든 중간 텐서를 해제합니다. 체크포인트 모듈을 역방향으로 전달하는 동안 이러한 텐서를 다시 계산합니다. PyTorch 이 시점에서 이 체크포인트 모듈 뒤에 있는 레이어는 역방향 패스를 완료했으므로 체크포인팅의 최대 메모리 사용량은 낮아집니다.

SMP v2는 활성화 체크포인트 모듈을 지원합니다. PyTorch apply_activation_checkpointing 다음은 Hugging Face GPT-Neox 모델의 활성화 체크포인팅 예시입니다.

Hugging Face GPT-Neox 모델의 체크포인팅 트랜스포머 레이어


from transformers.models.gpt_neox import GPTNeoXLayer
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
    apply_activation_checkpointing
)
    
# check_fn receives a module as the arg, 
# and it needs to return whether the module is to be checkpointed
def is_transformer_layer(module):
    from transformers.models.gpt_neox import GPTNeoXLayer
    return isinstance(submodule, GPTNeoXLayer)
    
apply_activation_checkpointing(model, check_fn=is_transformer_layer)

Hugging Face GPT-Neox 모델의 다른 모든 트랜스포머 레이어를 체크포인팅


# check_fn receives a module as arg, 
# and it needs to return whether the module is to be checkpointed
# here we define that function based on global variable (transformer_layers)
from transformers.models.gpt_neox import GPTNeoXLayer
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
    apply_activation_checkpointing
)

transformer_layers = [
    m for m model.modules() if isinstance(m, GPTNeoXLayer)
]

def is_odd_transformer_layer(module):
    return transformer_layers.index(module) % 2 == 0
    
apply_activation_checkpointing(model, check_fn=is_odd_transformer_layer)

PyTorch 또한 Hugging Face Transformer 모델 중 일부에서 사용되는 체크포인트 torch.utils.checkpoint 모듈도 있습니다. 이 모듈은 SMP v2에서도 작동합니다. 하지만 체크포인트 래퍼를 추가하려면 모델 정의에 액세스할 수 있어야 합니다. 따라서 메서드를 apply_activation_checkpointing 사용하는 것이 좋습니다.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

지연된 파라미터 초기화

활성화 오프로딩