Prepare Input Data for Training In Amazon SageMaker
The original head pose image database (HeadPoseImageDatabase.tar.gz file) consists of a set of images (.jpeg files) and labels (.txt files) of head poses of 15 persons in a series of tilt and pan positions. They must be transformed to a single .pkl file as the input to an Amazon SageMaker training job.
To prepare the training data, you can use the preprocessing script (preprocessingDataset_py2.py) provided in the Head Pose Sample Project on Github. You can run this script on your computer locally or on the AWS Cloud through a Python Notebook instance. In this section, you'll learn how to prepare the training data locally. You'll learn how to do it in a Notebook instance in Train a Head Pose Detection Model in Amazon SageMaker.
-
Open a terminal and change directory to the directory where to you downloaded the head pose sample project.
-
Run the following Python (version 2.7) command
python preprocessingDataset_py2.py --num-data-aug 15 --aspect-ratio 1
If you get an error stating that the
cv2
module is not found, run the following command to install the missing module:python -m pip install opencv-python
Then, rerun the preprocessing script.
The process takes 15-20 hours to complete. The preprocessing script outputs the results as the HeadPoseData_trn_test_x15_py2.pkl file in the current working directory.
-
Upload the preprocessed training data (HeadPoseData_trn_test_x15_py2.pkl) file into the
deeplens-sagemaker-models-
folder in Amazon S3. You can do this using the Amazon S3 console or, as shown as follows, using the following AWS CLI command:<my-name>
/headpose/datasetsaws s3 cp HeadPoseData_trn_test_x15_py2.pkl s3://deeplens-sagemaker-models-
<my-name>
/headpose/datasets
Note
Instead of running locally, you can prepare the input data on the AWS Cloud by a
code cell in a Amazon SageMaker notebook instance, where cv2
and Python 2.7
(python2
) are readily available, and run the commands in
Step 2 and Step 3, respectively, as
follows:
!python2 preprocessingDataset_py2.py --num-data-aug 15 --aspect-ratio 1
and
!aws s3 cp HeadPoseData_trn_test_x15_py2.pkl s3://deeplens-sagemaker-models-
<my-name>
/headpose/datasets
You notebook instance needs to run on a sufficiently powerful EC2 instance of,
e.g., the ml.p2.xlarge
type.
Now that we have the input data prepared and stored in S3, we're ready to train the model.