Bring your own container (BYOC)
Amazon Braket Hybrid Jobs provides three pre-built containers for running code in different
environments. If one of these containers supports your use case, you only have to provide
your algorithm script when you create a hybrid job. Minor missing dependencies can be added
from your algorithm script or from a requirements.txt
file using
pip
.
If none of these containers support your use case, or if you wish to expand on them, Braket Hybrid Jobs supports running hybrid jobs with your own custom Docker container image, or bring your own container (BYOC). But before we dive in, let’s make sure it’s actually the right feature for your use case.
When is bringing my own container the right decision?
Bringing your own container (BYOC) to Braket Hybrid Jobs offers the flexibility to use your own software by installing it in a packaged environment. Depending on your specific needs, there may be ways to achieve the same flexibility without having to go through the full BYOC Docker build - Amazon ECR upload - custom image URI cycle.
Note
BYOC may not be the right choice if you want to add a small number of additional Python packages (generally fewer than 10) which are publicly available. For example, if you're using PyPi.
In this case, you can use one of the pre-built Braket images, and then include a
requirements.txt
file in your source directory at the job submission. The file
is automatically read, and pip
will install the packages with the specified versions
as normal. If you're installing a large number of packages, the runtime of your jobs may be
substantially increased. Check the Python and, if applicable, CUDA version of the prebuilt
container you want to use to test if your software will work.
BYOC is necessary when you want to use a non-Python language (like C++ or Rust) for your job script, or if you want to use a Python version not available through the Braket pre-built containers. It’s also a good choice if:
-
You're using software with a license key, and you need to authenticate that key against a licensing server to run the software. With BYOC, you can embed the license key in your Docker image and include code to authenticate it.
-
You're using software that isn’t publicly available. For example, the software is hosted on a private GitLab or GitHub repository that you need a particular SSH key to access.
-
You need to install a large suite of software that isn’t packaged in the Braket provided containers. BYOC will allow you to eliminate long startup times for your hybrid jobs containers due to software installation.
BYOC also enables you to make your custom SDK or algorithm available to customers by building a Docker container with your software and making it available to your users. You can do this by setting appropriate permissions in Amazon ECR.
Note
You must comply with all applicable software licenses.
Recipe for bringing your own container
In this section, we provide a step-by-step guide of what you’ll need to bring your own container (BYOC) to Braket Hybrid Jobs — the scripts, files, and steps to combine them in order to get up and running with your custom Docker images. We provide recipes for two common cases:
-
Install additional software in a Docker image and use only Python algorithm scripts in your jobs.
-
Use algorithm scripts written in a non-Python language with Hybrid Jobs, or a CPU architecture besides x86.
Defining the container entry script is more complex for case 2.
When Braket runs your Hybrid Job, it launches the requested number and type of Amazon EC2 instances, then runs the Docker image specified by the image URI input to job creation on them. When using the BYOC feature, you specify an image URI hosted in a private Amazon ECR repository that you have Read access to. Braket Hybrid Jobs uses that custom image to run the job.
The specific components you need to build a Docker image that can
be used with Hybrid Jobs. If you’re unfamiliar with writing and building Dockerfiles
, we suggest
you refer to the Dockerfile documentation
Here’s an overview of what you’ll need:
A base image for your Dockerfile
If you are using Python and want to install software on top of what’s provided
in the Braket provided containers, an option for a base image is one of the Braket
container images, hosted in our
GitHub repoFROM [IMAGE_URI_HERE]
Next, fill out the rest of the Dockerfile to install and set up the software that you want to add to the container. The pre-built Braket images will already contain the appropriate container entry point script, so you don’t need to worry about including that.
If you want to use a non-Python language, such as C++, Rust, or Julia, or if you want to build
an image for a non-x86 CPU architecture, like ARM, you may need to build on top of a
barebones public image. You can find many such images at the
Amazon Elastic Container Registry Public Gallery
(Optional) A modified container entry point script
Note
If you're only adding additional software to a pre-built Braket image, you can skip this section.
To run non-Python code as part of your hybrid job, you’ll need to modify the Python script which
defines the container entry point. For example, the
braket_container.py
python script on the Amazon Braket Github thekick_off_customer_script()
You can also choose to write a completely new braket_container.py
. It should copy input data, source archives,
and other necessary files from Amazon S3 into the container, and define the appropriate environment variables.
A Dockerfile
that installs any necessary software and includes the container script
Note
If you use a pre-built Braket image as your Docker base image, the container script is already present.
If you created a modified container script in the previous step, you'll need to copy it into the container
and define the environment variable SAGEMAKER_PROGRAM
to
braket_container.py
, or what you have named your new container entry point script.
The following is an example of a Dockerfile
that allows you to use Julia on GPU-accelerated Jobs instances:
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04 ARG DEBIAN_FRONTEND=noninteractive ARG JULIA_RELEASE=1.8 ARG JULIA_VERSION=1.8.3 ARG PYTHON=python3.11 ARG PYTHON_PIP=python3-pip ARG PIP=pip ARG JULIA_URL = https://julialang-s3.julialang.org/bin/linux/x64/${JULIA_RELEASE}/ ARG TAR_NAME = julia-${JULIA_VERSION}-linux-x86_64.tar.gz ARG PYTHON_PKGS = # list your Python packages and versions here RUN curl -s -L ${JULIA_URL}/${TAR_NAME} | tar -C /usr/local -x -z --strip-components=1 -f - RUN apt-get update \ && apt-get install -y --no-install-recommends \ build-essential \ tzdata \ openssh-client \ openssh-server \ ca-certificates \ curl \ git \ libtemplate-perl \ libssl1.1 \ openssl \ unzip \ wget \ zlib1g-dev \ ${PYTHON_PIP} \ ${PYTHON}-dev \ RUN ${PIP} install --no-cache --upgrade ${PYTHON_PKGS} RUN ${PIP} install --no-cache --upgrade sagemaker-training==4.1.3 # Add EFA and SMDDP to LD library path ENV LD_LIBRARY_PATH="/opt/conda/lib/python${PYTHON_SHORT_VERSION}/site-packages/smdistributed/dataparallel/lib:$LD_LIBRARY_PATH" ENV LD_LIBRARY_PATH=/opt/amazon/efa/lib/:$LD_LIBRARY_PATH # Julia specific installation instructions COPY Project.toml /usr/local/share/julia/environments/v${JULIA_RELEASE}/ RUN JULIA_DEPOT_PATH=/usr/local/share/julia \ julia -e 'using Pkg; Pkg.instantiate(); Pkg.API.precompile()' # generate the device runtime library for all known and supported devices RUN JULIA_DEPOT_PATH=/usr/local/share/julia \ julia -e 'using CUDA; CUDA.precompile_runtime()' # Open source compliance scripts RUN HOME_DIR=/root \ && curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \ && unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \ && cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \ && chmod +x /usr/local/bin/testOSSCompliance \ && chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \ && ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \ && rm -rf ${HOME_DIR}/oss_compliance* # Copying the container entry point script COPY braket_container.py /opt/ml/code/braket_container.py ENV SAGEMAKER_PROGRAM braket_container.py
This example, downloads and runs scripts provided by AWS to ensure compliance with all relevant Open-Source licenses. For example, by properly attributing any installed code governed by an MIT license.
If you need to include non-public code, for instance code that is hosted in a private GitHub or GitLab repository,
do not embed SSH keys in the Docker image to access it. Instead,
use Docker Compose when you build to allow Docker to access SSH on the host
machine it is built on. For more information, see the
Securely using SSH keys in Docker to access private
Github repositories
Building and uploading your Docker image
With a properly defined Dockerfile
, you are now ready to follow the steps to
create a private Amazon ECR repository,
if one does not already exist. You can also build, tag, and upload your container image to the repository.
You are ready to build, tag, and push the image. See the
Docker build documentationdocker build
and some examples.
For the sample file defined above, you could run:
aws ecr get-login-password --region ${your_region} | docker login --username AWS --password-stdin ${aws_account_id}.dkr.ecr.${your_region}.amazonaws.com docker build -t braket-julia . docker tag braket-julia:latest ${aws_account_id}.dkr.ecr.${your_region}.amazonaws.com/braket-julia:latest docker push ${aws_account_id}.dkr.ecr.${your_region}.amazonaws.com/braket-julia:latest
Assigning appropriate Amazon ECR permissions
Braket Hybrid Jobs Docker images must be hosted in private Amazon ECR repositories. By default, a private Amazon ECR repo does not provide read access to the Braket Hybrid Jobs IAM role or to any other users that want to use your image, such as a collaborator or student. You must set a repository policy in order to grant the appropriate permissions. In general, only give permission to those specific users and IAM roles you want to access your images, rather than allowing anyone with the image URI to pull them.
Running Braket hybrid jobs in your own container
To create a hybrid job with your own container, call AwsQuantumJob.create()
with the argument
image_uri
specified. You can use a QPU, an on-demand simulator, or run your code locally on the
classical processor available with Braket Hybrid Jobs. We recommend testing your code out on a simulator
like SV1, DM1, or TN1 before running on a real QPU.
To run your code on the classical processor, specify the instanceType
and the instanceCount
you use by updating the InstanceConfig
. Note that if you specify an instance_count
> 1,
you need to make sure that your code can run across multiple hosts. The upper limit for the number of
instances you can choose is 5. For example:
job = AwsQuantumJob.create( source_module="source_dir", entry_point="source_dir.algorithm_script:start_here", image_uri="111122223333.dkr.ecr.us-west-2.amazonaws.com/my-byoc-container:latest", instance_config=InstanceConfig(instanceType="ml.p3.8xlarge", instanceCount=3), device="local:braket/braket.local.qubit", # ...)
Note
Use the device ARN to track the simulator you used as hybrid job metadata.
Acceptable values must follow the format device = "local:<provider>/<simulator_name>"
.
Remember that <provider>
and <simulator_name>
must consist only of letters,
numbers, _
, -
, and .
. The string is limited to 256 characters.
If you plan to use BYOC and you're not using the Braket SDK to create quantum tasks, you should pass the
value of the environmental variable AMZN_BRAKET_JOB_TOKEN
to the jobToken
parameter
in the CreateQuantumTask
request. If you don't, the quantum tasks don't get priority and
are billed as regular standalone quantum tasks.