# Tune a Sequence-to-Sequence Model

*Automatic model tuning*, also known as
hyperparameter tuning, finds the best version of a model by running many jobs that test
a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a
range of values for each, and an objective metric. You choose the objective metric from
the metrics that the algorithm computes. Automatic model tuning searches the
hyperparameters chosen to find the combination of values that result in the model that
optimizes the objective metric.

For more information about model tuning, see Perform Automatic Model Tuning with SageMaker.

## Metrics Computed by the Sequence-to-Sequence Algorithm

The sequence to sequence algorithm reports three metrics that are computed during training. Choose one of them as an objective to optimize when tuning the hyperparameter values.

Metric Name | Description | Optimization Direction |
---|---|---|

`validation:accuracy` |
Accuracy computed on the validation dataset. |
Maximize |

`validation:bleu` |
Bleu |
Maximize |

`validation:perplexity` |
Perplexity |
Minimize |

## Tunable Sequence-to-Sequence Hyperparameters

You can tune the following hyperparameters for the SageMaker Sequence to Sequence
algorithm. The hyperparameters that have the greatest impact on sequence to sequence
objective metrics are: `batch_size`

, `optimizer_type`

,
`learning_rate`

, `num_layers_encoder`

, and
`num_layers_decoder`

.

Parameter Name | Parameter Type | Recommended Ranges |
---|---|---|

`num_layers_encoder` |
IntegerParameterRange |
[1-10] |

`num_layers_decoder` |
IntegerParameterRange |
[1-10] |

`batch_size` |
CategoricalParameterRange |
[16,32,64,128,256,512,1024,2048] |

`optimizer_type` |
CategoricalParameterRange |
['adam', 'sgd', 'rmsprop'] |

`weight_init_type` |
CategoricalParameterRange |
['xavier', 'uniform'] |

`weight_init_scale` |
ContinuousParameterRange |
For the xavier type: MinValue: 2.0, MaxValue: 3.0 For the uniform type: MinValue: -1.0, MaxValue: 1.0 |

`learning_rate` |
ContinuousParameterRange |
MinValue: 0.00005, MaxValue: 0.2 |

`weight_decay` |
ContinuousParameterRange |
MinValue: 0.0, MaxValue: 0.1 |

`momentum` |
ContinuousParameterRange |
MinValue: 0.5, MaxValue: 0.9 |

`clip_gradient` |
ContinuousParameterRange |
MinValue: 1.0, MaxValue: 5.0 |

`rnn_num_hidden` |
CategoricalParameterRange |
Applicable only to recurrent neural networks (RNNs). [128,256,512,1024,2048] |

`cnn_num_hidden` |
CategoricalParameterRange |
Applicable only to convolutional neural networks (CNNs). [128,256,512,1024,2048] |

`num_embed_source` |
IntegerParameterRange |
[256-512] |

`num_embed_target` |
IntegerParameterRange |
[256-512] |

`embed_dropout_source` |
ContinuousParameterRange |
MinValue: 0.0, MaxValue: 0.5 |

`embed_dropout_target` |
ContinuousParameterRange |
MinValue: 0.0, MaxValue: 0.5 |

`rnn_decoder_hidden_dropout` |
ContinuousParameterRange |
MinValue: 0.0, MaxValue: 0.5 |

`cnn_hidden_dropout` |
ContinuousParameterRange |
MinValue: 0.0, MaxValue: 0.5 |

`lr_scheduler_type` |
CategoricalParameterRange |
['plateau_reduce', 'fixed_rate_inv_t', 'fixed_rate_inv_sqrt_t'] |

`plateau_reduce_lr_factor` |
ContinuousParameterRange |
MinValue: 0.1, MaxValue: 0.5 |

`plateau_reduce_lr_threshold` |
IntegerParameterRange |
[1-5] |

`fixed_rate_lr_half_life` |
IntegerParameterRange |
[10-30] |