# Sequence-to-Sequence Hyperparameters

Parameter Name | Description |
---|---|

`batch_size` |
Mini batch size for gradient descent.
Valid values: positive integer Default value: 64 |

`beam_size` |
Length of the beam for beam search. Used during training for
computing
Valid values: positive integer Default value: 5 |

`bleu_sample_size` |
Number of instances to pick from validation dataset to decode
and compute
Valid values: integer Default value: 0 |

`bucket_width` |
Returns (source,target) buckets up to
(
Valid values: positive integer Default value: 10 |

`bucketing_enabled` |
Set to
Valid values: Default value: |

`checkpoint_frequency_num_batches` |
Checkpoint and evaluate every x batches. This checkpointing hyperparameter is passed to the SageMaker's seq2seq algorithm for early stopping and retrieving the best model. The algorithm's checkpointing runs locally in the algorithm's training container and is not compatible with SageMaker checkpointing. The algorithm temporarily saves checkpoints to a local path and stores the best model artifact to the model output path in S3 after the training job has stopped.
Valid values: positive integer Default value: 1000 |

`checkpoint_threshold` |
Maximum number of checkpoints model is allowed to not improve in
Valid values: positive integer Default value: 3 |

`clip_gradient` |
Clip absolute gradient values greater than this. Set to negative to disable.
Valid values: float Default value: 1 |

`cnn_activation_type` |
The
Valid values: String. One of Default value: |

`cnn_hidden_dropout` |
Dropout probability for dropout between convolutional layers.
Valid values: Float. Range in [0,1]. Default value: 0 |

`cnn_kernel_width_decoder` |
Kernel width for the
Valid values: positive integer Default value: 5 |

`cnn_kernel_width_encoder` |
Kernel width for the
Valid values: positive integer Default value: 3 |

`cnn_num_hidden` |
Number of
Valid values: positive integer Default value: 512 |

`decoder_type` |
Decoder type.
Valid values: String. Either Default value: |

`embed_dropout_source` |
Dropout probability for source side embeddings.
Valid values: Float. Range in [0,1]. Default value: 0 |

`embed_dropout_target` |
Dropout probability for target side embeddings.
Valid values: Float. Range in [0,1]. Default value: 0 |

`encoder_type` |
Encoder type. The
Valid values: String. Either Default value: |

`fixed_rate_lr_half_life` |
Half life for learning rate in terms of number of checkpoints
for
Valid values: positive integer Default value: 10 |

`learning_rate` |
Initial learning rate.
Valid values: float Default value: 0.0003 |

`loss_type` |
Loss function for training.
Valid values: String. Default value: |

`lr_scheduler_type` |
Learning rate scheduler type.
Valid values: String. One of Default value: |

`max_num_batches` |
Maximum number of updates/batches to process. -1 for infinite.
Valid values: integer Default value: -1 |

`max_num_epochs` |
Maximum number of epochs to pass through training data before fitting is stopped. Training continues until this number of epochs even if validation accuracy is not improving if this parameter is passed. Ignored if not passed.
Valid values: Positive integer and less than or equal to max_num_epochs. Default value: none |

`max_seq_len_source` |
Maximum length for the source sequence length. Sequences longer than this length are truncated to this length.
Valid values: positive integer Default value: 100 |

`max_seq_len_target` |
Maximum length for the target sequence length. Sequences longer than this length are truncated to this length.
Valid values: positive integer Default value: 100 |

`min_num_epochs` |
Minimum number of epochs the training must run before it is
stopped via
Valid values: positive integer Default value: 0 |

`momentum` |
Momentum constant used for
Valid values: float Default value: none |

`num_embed_source` |
Embedding size for source tokens.
Valid values: positive integer Default value: 512 |

`num_embed_target` |
Embedding size for target tokens.
Valid values: positive integer Default value: 512 |

`num_layers_decoder` |
Number of layers for Decoder
Valid values: positive integer Default value: 1 |

`num_layers_encoder` |
Number of layers for Encoder
Valid values: positive integer Default value: 1 |

`optimized_metric` |
Metrics to optimize with early stopping.
Valid values: String. One of Default value: |

`optimizer_type` |
Optimizer to choose from.
Valid values: String. One of Default value: |

`plateau_reduce_lr_factor` |
Factor to multiply learning rate with (for
Valid values: float Default value: 0.5 |

`plateau_reduce_lr_threshold` |
For
Valid values: positive integer Default value: 3 |

`rnn_attention_in_upper_layers` |
Pass the attention to upper layers of
Valid values: boolean ( Default value: |

`rnn_attention_num_hidden` |
Number of hidden units for attention layers. defaults to
Valid values: positive integer Default value: |

`rnn_attention_type` |
Attention model for encoders.
Valid values: String. One of Default value: |

`rnn_cell_type` |
Specific type of
Valid values: String. Either Default value: |

`rnn_decoder_state_init` |
How to initialize
Valid values: String. One of Default value: |

`rnn_first_residual_layer` |
First
Valid values: positive integer Default value: 2 |

`rnn_num_hidden` |
The number of
Valid values: positive even integer Default value: 1024 |

`rnn_residual_connections` |
Add residual connection to stacked
Valid values: boolean ( Default value: |

`rnn_decoder_hidden_dropout` |
Dropout probability for hidden state that combines the context
with the
Valid values: Float. Range in [0,1]. Default value: 0 |

`training_metric` |
Metrics to track on training on validation data.
Valid values: String. Either Default value: |

`weight_decay` |
Weight decay constant.
Valid values: float Default value: 0 |

`weight_init_scale` |
Weight initialization scale (for
Valid values: float Default value: 2.34 |

`weight_init_type` |
Type of weight initialization.
Valid values: String. Either Default value: |

`xavier_factor_type` |
Xavier factor type.
Valid values: String. One of Default value: |