Validation rules for manifest files - Rekognition

Validation rules for manifest files

When you import a manifest file, Amazon Rekognition Custom Labels applies validation rules for limits, syntax, and semantics. The SageMaker Ground Truth schema enforces syntax validation. For more information, see Output. The following are the validation rules for limits and semantics.

Note
  • The 20% invalidity rules apply cumulatively across all validation rules. If the import exceeds the 20% limit due to any combination, such as 15% invalid JSON and 15% invalid images, the import fails.

  • Each dataset object is a line in the manifest. Blank/invalid lines are also counted as dataset objects.

  • Overlaps are (common labels between test and train)/(train labels).

Limits

Validation Limit Error raised

Manifest file size

Maximum 1 GB

Error

Maximum line count for a manifest file

Maximum of 250,000 dataset objects as lines in a manifest.

Error

Lower boundary on total number of valid dataset objects per label

>= 1

Error

Lower boundary on labels

>=2

Error

Upper bound on labels

<= 250

Error

Minimum bounding boxes per image

0

None

Maximum bounding boxes per image

50

None

Semantics

Validation Limit Error raised

Empty manifest

Error

Missing/in-accessible source-ref object

Number of objects less than 20%

Warning

Missing/in-accessible source-ref object

Number of objects > 20%

Error

Test labels not present in training dataset

At least 50% overlap in the labels

Error

Mix of label vs. object examples for same label in a dataset. Classification and detection for the same class in a dataset object.

No error or warning

Overlapping assets between test and train

There should not be an overlap between test and training datasets.

Images in a dataset must be from same bucket

Error if the objects are in a different bucket

Error