Releases: stickeritis/sticker2
Mixed-precision training
The most important new feature of this release is mixed-precision training 🎉. This speeds up training and lowers memory use on GPUs with Tensor Cores. Mixed-precision training can be enabled using the --mixed-precision
option of sticker2 finetune
and sticker2 distill
.
Other notable changes:
- Use fast AVX2 kernels on AMD Zen CPUs, without setting any special environment variables.
- Update the
sentencepiece
crate dependency to 0.4. This version compiles the sentencepiece library statically if it is not available, removing the dependency on an external sentencepiece build. - The TensorBoard summary writer support that was added in 0.4.2 is now feature-gated (
tensorboard
). This makes it possible to compile sticker2 without TensorBoard support for quicker compiles and smaller binaries.
Tensorboard summaries
The most important new features in this release is addition of support for writing TensorBoard summaries to sticker2 annotate
and sticker2 distill
. The options --log-prefix
is added to both subcommands. This option enables writing of TensorBoard summaries to the given log prefix. Losses and accuracies are logged for each layer, as well as the average loss.
This release also contains a fix for a bug where the variables had an additional spurious encoder
prefix in finetuned models.
Support for ALBERT models and update to PyTorch 1.6
- Add support for the ALBERT model. This provides two additional features over BERT:
- The embedding size can be different than the hidden size. A linear transformation is applied to embeddings to make their sizes the same as the hidden state size. The embedding size is set through the
embedding_size
option of the model configuration. - Multiple layers can share the same weights. The number of hidden layers is specified through
num_hidden_layers
as before. The additionalnum_hidden_groups
option determines the number weight groups. E.g. ifnum_hidden_layers
is set to12
andnum_hidden_groups
to3
, then each grouping of 4 layers consecutive layers share the same weights. - The ALBERT model can be used by setting
pretrain_type = "albert"
in the sticker2 configuration file.
- The embedding size can be different than the hidden size. A linear transformation is applied to embeddings to make their sizes the same as the hidden state size. The embedding size is set through the
- Tokenizer types are separated from model types. Before this change, picking a particular model would select a tokenizer. Now the tokenizer type can be selected separately from the model. The tokenizer is selected through the
tokenizer
option, which replacesvocab
. The possible values are:- ALBERT:
tokenizer = { albert = { vocab = "vocab.model" } }
- BERT:
tokenizer = { bert = { vocab = "vocab.txt" } }
- XLM-R:
tokenizer = { xlm_roberta = { vocab = "vocab.model" } }
- ALBERT:
- Update to sticker-transformers 0.8, tch 0.2, and libtorch 1.6.0.
- Update to sentencepiece 0.3. This version is compatible with sentencepiece 0.1.9x.
Switch to CoNLL-U format
The most visible change is that from version 0.3.0 onwards, sticker2 uses the CoNLL-U format. Besides that there were many other improvements:
- Switch from CoNLL-X to CoNLL-U as the file format.
- Much-improved error messages.
- Add
TdzLemmaEncoder
This encoder uses the edit tree encoder, but performs the necessary pre- and postprocessing to produce TüBa-D/Z style lemmas. - Add an option to ℓ2-normalize sinusoidal embeddings and make it the default. This improves model convergence (suggested by @twuebi).
- Support encoding of the full features column as a string (rather than individual attributes/values).
- Permit setting a default value for features. This is useful for using features that are not annotated on every token.
- Add the
filter-len
subcommand. This filters a corpus by the sentence length in word or sentence pieces. - Improvements to serialization of encoders: remove phantom data and storing the feature <-> number bijection twice.
- Update to libtorch 1.5.0.
Models trained with versions prior to 0.3.0 are not compatible with this version. At the moment we only provide compatibility of models with each version y in 0.y.z.
Support for XLM-RoBERTa & distillation improvements
- Add support for finetuning XLM-RoBERTa models.
- Support for distillation with separate teacher/student vocabularies.
- Make it possible to set the number of PyTorch threads in
sticker2 annotate
andsticker2 server
. - Remove word pieces vectorizer (this is now handled by the
wordpieces
crate). - sticker 0.1.0 models are not fully compatible with 0.2.0, but can be patched to work.