ステップ 0: 段階的に移行するかどうかを決めるステップ 1: 前提条件をインストールし、CodeCommit リポジトリをリモートとして追加するステップ 2: 段階的移行に使用するスクリプトを作成するステップ 3: スクリプトを実行し、CodeCommit に段階的に移行する付録: サンプルスクリプト incremental-repo-migration.py

リポジトリを段階的に移行する

断続的なネットワークの問題が発生しないように、AWS CodeCommit に移行する際、段階的にリポジトリをプッシュするか、チャンクをプッシュしてください。これを行わないと、ネットワークパフォーマンスが低下し、プッシュ全体が失敗することがあります。次のようなスクリプトを使用して、段階的にプッシュすることで、移行を再開して、以前失敗したコミットのみプッシュすることができます。

このトピックの手順では、リポジトリの段階的な移行を行うスクリプトを作成して実行し、移行が完了するまで行われなかった段階的プッシュのみ再度行う方法について説明します。

これらの手順は、「セットアップ」および「リポジトリの作成」のステップを既に完了していることを前提としています。

トピック

ステップ 0: 段階的に移行するかどうかを決める
ステップ 1: 前提条件をインストールし、CodeCommit リポジトリをリモートとして追加する
ステップ 2: 段階的移行に使用するスクリプトを作成する
ステップ 3: スクリプトを実行し、CodeCommit に段階的に移行する
付録: サンプルスクリプト incremental-repo-migration.py

ステップ 0: 段階的に移行するかどうかを決める

リポジトリの全体サイズと段階的に移行するかどうかを決めるには、複数の要因を検討します。最も重要な要素は、リポジトリのアーティファクトの全体サイズです。リポジトリの累積履歴などの要素もサイズに関連します。各アセットのサイズは大きくなくても、長年の履歴を含むリポジトリやブランチのサイズは非常に大きくなります。これらのリポジトリの移行を単純にし、効率的にするための戦略は多数あります。たとえば、開発期間の長いリポジトリのクローン作成時に浅いクローン戦略を使用したり、大きなバイナリファイルの差分圧縮を無効にしたりできます。Git ドキュメントを確認してオプションを調査するか、段階的なプッシュをセットアップおよび設定して、このトピックのサンプルスクリプト incremental-repo-migration.py を使用してリポジトリを移行します。

以下の条件のいずれかに当てはまる場合は、段階的プッシュを設定します。

移行するリポジトリの履歴に 5 年以上含まれる
インターネット接続は、断続的な停止や、削除されたパケットの中断、低速なレスポンスを招くだけでなく、その他のサービスへの停止にもつながります。
リポジトリの全体サイズが 2 GB を超えているため、リポジトリ全体を移行することを予定します。
リポジトリには、あまり圧縮されていない大きなアーティファクトやバイナリなどが含まれます。たとえば、追跡されたバージョンが 6 以上ある大きな画像ファイルがあります。
過去に CodeCommit への移行を試みて、「内部サービスエラー」メッセージを受け取っています。

上記の条件のいずれも当てはまらない場合でも、段階的なプッシュを行うことができます。

ステップ 1: 前提条件をインストールし、CodeCommit リポジトリをリモートとして追加する

独自のカスタムスクリプトを作成できます。これにより独自の前提条件を指定することができます。このトピックのサンプルを使用する場合は、次のことを行う必要があります。

前提条件を満たすこと。
リポジトリのクローンをローカルコンピュータに作成します。
移行するリポジトリのリモートとして、CodeCommit リポジトリを追加します。

incremental-repo-migration.py をセットアップして実行する

ローカルコンピュータに Python 2.6 以降をインストールします。詳細と最新バージョンについては、「Python ウェブサイト」を参照してください。
同じコンピュータに GitPython をインストールします。これは、Git リポジトリと通信するために使用する Python ライブラリです。詳細については、「GitPython ドキュメント」を参照してください。
git clone --mirror コマンドでは、ローカルコンピュータに移行するリポジトリのクローンを作成します。ターミナル (Linux、macOS、または Unix) またはコマンドプロンプト (Windows) より、git clone --mirror コマンドを使用して、該当リポジトリのローカルリポジトリ (例: ローカルリポジトリを作成するディレクトリ) を作成します。たとえば、URL (https://example.com/my-repo/) を持つ Git リポジトリ (MyMigrationRepo) のクローンをディレクトリ (my-repo) に作成するには、以下のように行います。
```
git clone --mirror https://example.com/my-repo/MyMigrationRepo.git my-repo
```
次のような出力が表示されます。これは、リポジトリのクローンが、ローカルリポジトリ (my-repo) に作成されたことを表します。
```
Cloning into bare repository 'my-repo'...
remote: Counting objects: 20, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 20 (delta 5), reused 15 (delta 3)
Unpacking objects: 100% (20/20), done.
Checking connectivity... done.        
```
クローンを作成したばかりのリポジトリのローカルリポジトリ (例: my-repo) にディレクトリを変更します。そのディレクトリから git remote add DefaultRemoteName RemoteRepositoryURL コマンドを使用し、ローカルリポジトリのリモートリポジトリとして CodeCommit リポジトリを追加します。

注記
大きなリポジトリをプッシュする場合は、HTTPS ではなく SSH を使用することを検討してください。大きな変更、多数の変更、大きなリポジトリのいずれかをプッシュすると、ネットワーク問題またはファイアウォール設定が原因で、長時間の HTTPS 接続は切断されることがあります。SSH 用に CodeCommit を設定する詳しい方法については、Linux、macOS、または Unix での SSH 接続の場合または Windows で SSH 接続をセットアップする手順を参照してください。

例えば、MyDestinationRepo という名前の CodeCommit リポジトリの SSH エンドポイントを codecommit という名前のリモートのリモートリポジトリとして追加するには、次のコマンドを使用します。
```
git remote add codecommit ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDestinationRepo
```
ヒント
これはクローンのため、デフォルトのリモート名 (origin) は既に使用されています。別のリモート名を使用する必要があります。この例では codecommit を使用していますが、任意の名前を使用できます。git remote show コマンドでは、ローカルリポジトリに設定されているリモートを一覧表示します。
git remote -v コマンドでは、ローカルリポジトリのフェッチおよびプッシュの設定を表示し、これらの設定が正しいことを確認します。例:
```
codecommit  ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDestinationRepo (fetch)
codecommit  ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDestinationRepo (push)      
```
ヒント
別のリモートリポジトリのフェッチおよびプッシュのエントリ (例: 送信元のエントリ) が依然として表示される場合は、git remote set-url --delete コマンドを使用してそれらのエントリを削除します。

ステップ 2: 段階的移行に使用するスクリプトを作成する

これらのステップでは、サンプルスクリプト (incremental-repo-migration.py) が既に使用されていることを前提としています。

テキストエディタを開き、「サンプルスクリプト」の内容を空のドキュメントに貼り付けます。
そのドキュメントをドキュメントディレクトリ (ローカルリポジトリの作業ディレクトリではありません) に保存し、ファイル名を incremental-repo-migration.py に変更します。選択するディレクトリが、ローカル環境またはパス変数で設定されているディレクトリであることを確認します。これで、コマンドラインまたはターミナルより、Python スクリプトを実行できるようになりました。

ステップ 3: スクリプトを実行し、CodeCommit に段階的に移行する

以上で作成された incremental-repo-migration.py スクリプトを使用して、ローカルリポジトリを CodeCommit リポジトリに段階的に移行できます。デフォルトでは、スクリプトは 1,000 コミットのバッチを使用したコミットをプッシュし、ローカルリポジトリおよびリモートリポジトリの設定として実行されるディレクトリの Git 設定を使用します。必要に応じて、incremental-repo-migration.py のオプションを使用し、他の設定を行うことができます。

ターミナルまたはコマンドプロンプトより、移行するローカルリポジトリにディレクトリを変更します。
そのディレクトリから、次のコマンドを実行します。
```
python incremental-repo-migration.py
```
スクリプトが実行され、ターミナルまたはコマンドプロンプトに進行状況が表示されます。リポジトリの大きさによっては、進行状況が遅れて表示される場合があります。単一プッシュが 3 度失敗すると、スクリプトは停止します。その後スクリプトを返します。失敗したバッチから開始できます。すべてのプッシュが継続し、移行が完了するまで、スクリプトを返すことができます。

ヒント

-l および -r オプションで、使用するローカル設定およびリモート設定が指定されている限り、任意のディレクトリより incremental-repo-migration.py を実行できます。たとえば、任意のディレクトリよりスクリプトを使用して、/tmp/my-repo のローカルリポジトリをリモート (codecommit) に移行するには、次のように行います。


python incremental-repo-migration.py -l "/tmp/my-repo" -r "codecommit"

また、-b オプションを使用して、段階的にプッシュする際に使用するデフォルトのバッチサイズを変更することもできます。たとえば、変更頻度の高い大きなバイナリファイルを含むリポジトリを定期的にプッシュし、ネットワーク帯域幅が制限された場所から操作する場合は、-b オプションを使用して、バッチサイズを 1,000 ではなく 500 に変更します。例:


python incremental-repo-migration.py -b 500

これにより、ローカルリポジトリは、500 コミットのバッチを使用して、段階的にプッシュされます。リポジトリ移行時にバッチサイズを再度変更する場合 (例: 試行失敗時にバッチサイズを小さくする場合) は、-b を使用してバッチサイズをリセットする前に、-c オプションを使用してバッチタグを削除してください。


python incremental-repo-migration.py -c
python incremental-repo-migration.py -b 250

重要

失敗後にスクリプトを返す場合は、絶対に -c オプションを使用しないでください。-c オプションでは、コミットのバッチに使用するタグが削除されます。バッチサイズを変更して再開する場合、または今後スクリプトを使用しない場合は、-c オプションを使用します。

付録: サンプルスクリプト `incremental-repo-migration.py`

参考用に、Python のサンプルスクリプト incremental-repo-migration.py を用意しています。このスクリプトでは、リポジトリを段階的にプッシュします。このスクリプトは、オープンソースコードのサンプルで、現状のまま提供されています。


# Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Amazon Software License (the "License").
# You may not use this file except in compliance with the License. A copy of the License is located at
#    http://aws.amazon.com/asl/
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for
# the specific language governing permissions and limitations under the License.

#!/usr/bin/env python

import os
import sys
from optparse import OptionParser
from git import Repo, TagReference, RemoteProgress, GitCommandError


class PushProgressPrinter(RemoteProgress):
    def update(self, op_code, cur_count, max_count=None, message=""):
        op_id = op_code & self.OP_MASK
        stage_id = op_code & self.STAGE_MASK
        if op_id == self.WRITING and stage_id == self.BEGIN:
            print("\tObjects: %d" % max_count)


class RepositoryMigration:
    MAX_COMMITS_TOLERANCE_PERCENT = 0.05
    PUSH_RETRY_LIMIT = 3
    MIGRATION_TAG_PREFIX = "codecommit_migration_"

    def migrate_repository_in_parts(
        self, repo_dir, remote_name, commit_batch_size, clean
    ):
        self.next_tag_number = 0
        self.migration_tags = []
        self.walked_commits = set()
        self.local_repo = Repo(repo_dir)
        self.remote_name = remote_name
        self.max_commits_per_push = commit_batch_size
        self.max_commits_tolerance = (
            self.max_commits_per_push * self.MAX_COMMITS_TOLERANCE_PERCENT
        )

        try:
            self.remote_repo = self.local_repo.remote(remote_name)
            self.get_remote_migration_tags()
        except (ValueError, GitCommandError):
            print(
                "Could not contact the remote repository. The most common reasons for this error are that the name of the remote repository is incorrect, or that you do not have permissions to interact with that remote repository."
            )
            sys.exit(1)

        if clean:
            self.clean_up(clean_up_remote=True)
            return

        self.clean_up()

        print("Analyzing repository")
        head_commit = self.local_repo.head.commit
        sys.setrecursionlimit(max(sys.getrecursionlimit(), head_commit.count()))

        # tag commits on default branch
        leftover_commits = self.migrate_commit(head_commit)
        self.tag_commits([commit for (commit, commit_count) in leftover_commits])

        # tag commits on each branch
        for branch in self.local_repo.heads:
            leftover_commits = self.migrate_commit(branch.commit)
            self.tag_commits([commit for (commit, commit_count) in leftover_commits])

        # push the tags
        self.push_migration_tags()

        # push all branch references
        for branch in self.local_repo.heads:
            print("Pushing branch %s" % branch.name)
            self.do_push_with_retries(ref=branch.name)

        # push all tags
        print("Pushing tags")
        self.do_push_with_retries(push_tags=True)

        self.get_remote_migration_tags()
        self.clean_up(clean_up_remote=True)

        print("Migration to CodeCommit was successful")

    def migrate_commit(self, commit):
        if commit in self.walked_commits:
            return []

        pending_ancestor_pushes = []
        commit_count = 1

        if len(commit.parents) > 1:
            # This is a merge commit
            # Ensure that all parents are pushed first
            for parent_commit in commit.parents:
                pending_ancestor_pushes.extend(self.migrate_commit(parent_commit))
        elif len(commit.parents) == 1:
            # Split linear history into individual pushes
            next_ancestor, commits_to_next_ancestor = self.find_next_ancestor_for_push(
                commit.parents[0]
            )
            commit_count += commits_to_next_ancestor
            pending_ancestor_pushes.extend(self.migrate_commit(next_ancestor))

        self.walked_commits.add(commit)

        return self.stage_push(commit, commit_count, pending_ancestor_pushes)

    def find_next_ancestor_for_push(self, commit):
        commit_count = 0

        # Traverse linear history until we reach our commit limit, a merge commit, or an initial commit
        while (
            len(commit.parents) == 1
            and commit_count < self.max_commits_per_push
            and commit not in self.walked_commits
        ):
            commit_count += 1
            self.walked_commits.add(commit)
            commit = commit.parents[0]

        return commit, commit_count

    def stage_push(self, commit, commit_count, pending_ancestor_pushes):
        # Determine whether we can roll up pending ancestor pushes into this push
        combined_commit_count = commit_count + sum(
            ancestor_commit_count
            for (ancestor, ancestor_commit_count) in pending_ancestor_pushes
        )

        if combined_commit_count < self.max_commits_per_push:
            # don't push anything, roll up all pending ancestor pushes into this pending push
            return [(commit, combined_commit_count)]

        if combined_commit_count <= (
            self.max_commits_per_push + self.max_commits_tolerance
        ):
            # roll up everything into this commit and push
            self.tag_commits([commit])
            return []

        if commit_count >= self.max_commits_per_push:
            # need to push each pending ancestor and this commit
            self.tag_commits(
                [
                    ancestor
                    for (ancestor, ancestor_commit_count) in pending_ancestor_pushes
                ]
            )
            self.tag_commits([commit])
            return []

        # push each pending ancestor, but roll up this commit
        self.tag_commits(
            [ancestor for (ancestor, ancestor_commit_count) in pending_ancestor_pushes]
        )
        return [(commit, commit_count)]

    def tag_commits(self, commits):
        for commit in commits:
            self.next_tag_number += 1
            tag_name = self.MIGRATION_TAG_PREFIX + str(self.next_tag_number)

            if tag_name not in self.remote_migration_tags:
                tag = self.local_repo.create_tag(tag_name, ref=commit)
                self.migration_tags.append(tag)
            elif self.remote_migration_tags[tag_name] != str(commit):
                print(
                    "Migration tags on the remote do not match the local tags. Most likely your batch size has changed since the last time you ran this script. Please run this script with the --clean option, and try again."
                )
                sys.exit(1)

    def push_migration_tags(self):
        print("Will attempt to push %d tags" % len(self.migration_tags))
        self.migration_tags.sort(
            key=lambda tag: int(tag.name.replace(self.MIGRATION_TAG_PREFIX, ""))
        )
        for tag in self.migration_tags:
            print(
                "Pushing tag %s (out of %d tags), commit %s"
                % (tag.name, self.next_tag_number, str(tag.commit))
            )
            self.do_push_with_retries(ref=tag.name)

    def do_push_with_retries(self, ref=None, push_tags=False):
        for i in range(0, self.PUSH_RETRY_LIMIT):
            if i == 0:
                progress_printer = PushProgressPrinter()
            else:
                progress_printer = None

            try:
                if push_tags:
                    infos = self.remote_repo.push(tags=True, progress=progress_printer)
                elif ref is not None:
                    infos = self.remote_repo.push(
                        refspec=ref, progress=progress_printer
                    )
                else:
                    infos = self.remote_repo.push(progress=progress_printer)

                success = True
                if len(infos) == 0:
                    success = False
                else:
                    for info in infos:
                        if (
                            info.flags & info.UP_TO_DATE
                            or info.flags & info.NEW_TAG
                            or info.flags & info.NEW_HEAD
                        ):
                            continue
                        success = False
                        print(info.summary)

                if success:
                    return
            except GitCommandError as err:
                print(err)

        if push_tags:
            print("Pushing all tags failed after %d attempts" % (self.PUSH_RETRY_LIMIT))
        elif ref is not None:
            print("Pushing %s failed after %d attempts" % (ref, self.PUSH_RETRY_LIMIT))
            print(
                "For more information about the cause of this error, run the following command from the local repo: 'git push %s %s'"
                % (self.remote_name, ref)
            )
        else:
            print(
                "Pushing all branches failed after %d attempts"
                % (self.PUSH_RETRY_LIMIT)
            )
        sys.exit(1)

    def get_remote_migration_tags(self):
        remote_tags_output = self.local_repo.git.ls_remote(
            self.remote_name, tags=True
        ).split("\n")
        self.remote_migration_tags = dict(
            (tag.split()[1].replace("refs/tags/", ""), tag.split()[0])
            for tag in remote_tags_output
            if self.MIGRATION_TAG_PREFIX in tag
        )

    def clean_up(self, clean_up_remote=False):
        tags = [
            tag
            for tag in self.local_repo.tags
            if tag.name.startswith(self.MIGRATION_TAG_PREFIX)
        ]

        # delete the local tags
        TagReference.delete(self.local_repo, *tags)

        # delete the remote tags
        if clean_up_remote:
            tags_to_delete = [":" + tag_name for tag_name in self.remote_migration_tags]
            self.remote_repo.push(refspec=tags_to_delete)


parser = OptionParser()
parser.add_option(
    "-l",
    "--local",
    action="store",
    dest="localrepo",
    default=os.getcwd(),
    help="The path to the local repo. If this option is not specified, the script will attempt to use current directory by default. If it is not a local git repo, the script will fail.",
)
parser.add_option(
    "-r",
    "--remote",
    action="store",
    dest="remoterepo",
    default="codecommit",
    help="The name of the remote repository to be used as the push or migration destination. The remote must already be set in the local repo ('git remote add ...'). If this option is not specified, the script will use 'codecommit' by default.",
)
parser.add_option(
    "-b",
    "--batch",
    action="store",
    dest="batchsize",
    default="1000",
    help="Specifies the commit batch size for pushes. If not explicitly set, the default is 1,000 commits.",
)
parser.add_option(
    "-c",
    "--clean",
    action="store_true",
    dest="clean",
    default=False,
    help="Remove the temporary tags created by migration from both the local repo and the remote repository. This option will not do any migration work, just cleanup. Cleanup is done automatically at the end of a successful migration, but not after a failure so that when you re-run the script, the tags from the prior run can be used to identify commit batches that were not pushed successfully.",
)

(options, args) = parser.parse_args()

migration = RepositoryMigration()
migration.migrate_repository_in_parts(
    options.localrepo, options.remoterepo, int(options.batchsize), options.clean
)

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

コンテンツを CodeCommit に移行する

セキュリティ