You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (https://arxiv.org/pdf/1909.10351.pdf) details an approach for finetuning an already pretrained small language model.
Describe the solution you'd like
The distillation loss functions in the TinyBERT paper should be usable when distilling a model in haystack using the distil_from method.
Additional context
This is the first of two issues for implementing finetuning as described in the TinyBERT paper. This issue focusses on the loss functions. The second issue focusses on data augmentation.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (https://arxiv.org/pdf/1909.10351.pdf) details an approach for finetuning an already pretrained small language model.
Describe the solution you'd like
The distillation loss functions in the TinyBERT paper should be usable when distilling a model in haystack using the
distil_from
method.Describe alternatives you've considered
https://arxiv.org/pdf/1910.08381.pdf: Seems to depend too heavily on expensive retraining and seems to be too task specific.
https://arxiv.org/pdf/2002.10957.pdf, https://arxiv.org/pdf/1910.01108.pdf: Seem only to focus on pretraining
Additional context
This is the first of two issues for implementing finetuning as described in the TinyBERT paper. This issue focusses on the loss functions. The second issue focusses on data augmentation.
The text was updated successfully, but these errors were encountered: