Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
Docs update for PytorchTransformerWrapper (#5295)
Browse files Browse the repository at this point in the history
* Updates the docs for PytorchTransformerWrapper

* Changelog

* Forgot one more parameter
  • Loading branch information
dirkgr authored Jul 7, 2021
1 parent 3d92ac4 commit 436c52d
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 5 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Ensured `ensure_model_can_train_save_and_load` is consistently random.
- Fixed weight tying logic in `T5` transformer module. Previously input/output embeddings were always tied. Now this is optional,
and the default behavior is taken from the `config.tie_word_embeddings` value when instantiating `from_pretrained_module()`.
- Fixed the docs for `PytorchTransformerWrapper`
- Fixed recovering training jobs with models that expect `get_metrics()` to not be called until they have seen at least one batch.

### Changed
Expand Down
17 changes: 12 additions & 5 deletions allennlp/modules/seq2seq_encoders/pytorch_transformer_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,27 @@ class PytorchTransformer(Seq2SeqEncoder):
input_dim : `int`, required.
The input dimension of the encoder.
num_layers : `int`, required.
The number of stacked self attention -> feedforward -> layer normalisation blocks.
feedforward_hidden_dim : `int`, required.
The middle dimension of the FeedForward network. The input and output
dimensions are fixed to ensure sizes match up for the self attention layers.
num_layers : `int`, required.
The number of stacked self attention -> feedforward -> layer normalisation blocks.
num_attention_heads : `int`, required.
The number of attention heads to use per layer.
use_positional_encoding : `bool`, optional, (default = `True`)
Whether to add sinusoidal frequencies to the input tensor. This is strongly recommended,
as without this feature, the self attention layers have no idea of absolute or relative
positional_encoding : `str`, optional, (default = `None`)
Specifies the type of positional encodings to use. Your options are
* `None` to have no positional encodings.
* `"sinusoidal"` to have sinusoidal encodings, as described in https://api.semanticscholar.org/CorpusID:13756489.
* `"embedding"` to treat positional encodings as learnable parameters
Without positional encoding, the self attention layers have no idea of absolute or relative
position (as they are just computing pairwise similarity between vectors of elements),
which can be important features for many tasks.
positional_embedding_size : `int`, optional, (default = `512`)
The number of positional embeddings.
dropout_prob : `float`, optional, (default = `0.1`)
The dropout probability for the feedforward network.
activation : `str`, (default = `"relu"`)
The activation function of intermediate layers. Must be either `"relu"` or `"gelu"`.
""" # noqa

def __init__(
Expand Down

0 comments on commit 436c52d

Please sign in to comment.