You can customize the Amazon Nova models using the distillation method for Amazon Bedrock to transfer knowledge from a larger advanced model (known as teacher) to a smaller, faster, and cost-efficient model (known as student). This results in a student model that is as performant as the teacher for a specific use-case.
Model distillation allows you to fine-tune and improve the performance of more efficient models when sufficient high quality labeled training data is not available and therefore could benefit from generating such data from an advanced model. You can choose to do so by leveraging their prompts without labels or their prompts with low- to medium-quality labels for a use case that:
-
Has particularly tight latency, cost, and accuracy requirements. You can benefit from matching the performance on specific tasks of advanced models with smaller models that are optimized for cost and latency.
-
Needs a custom model that is tuned for a specific set of tasks, but sufficient quantity or quality of labeled training data is not available for fine-tuning.
The distillation method used with Amazon Nova can deliver a custom model that exceeds the performance of the teacher model for the specific use case when some labeled prompt-response pairs that demonstrate the customer’s expectation is provided to supplement the unlabeled prompts.
Available models
Model distillation is currently available for Amazon Nova Pro as a teacher to Amazon Nova Lite and Micro as students.
Note
Model distillation with Amazon Nova models is available in public preview and only for the text understanding models.
Guidelines for model distillation with
Amazon Nova
As a first step, follow the Text understanding prompting best practices and tune your input prompt with Amazon Nova Pro to ensure the prompt is optimized to get the best out of the teacher model.
When preparing your input dataset for a distillation job using your own prompts, follow the recommendations below:
-
When only unlabeled prompt data is available, supplement it with a small amount (~10) of curated high quality labeled prompt-response pair data to help the model learn better. If you submit a small number of high-quality, representative examples, you can create a custom model that exceeds the performance of the teacher model.
-
When labeled prompt-response pair data is available but has some room for improvement, include the responses in the submitted data.
-
When labeled prompt-response pair data is available but the labels are of poor quality and the training would be better suited to align with the teacher model directly, remove all responses before submitting the data.