Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing distillation loss functions from TinyBERT #1873

Closed
Tracked by #1551
MichelBartels opened this issue Dec 13, 2021 · 1 comment
Closed
Tracked by #1551

Implementing distillation loss functions from TinyBERT #1873

MichelBartels opened this issue Dec 13, 2021 · 1 comment

Comments

@MichelBartels
Copy link
Contributor

Is your feature request related to a problem? Please describe.
A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (https://arxiv.org/pdf/1909.10351.pdf) details an approach for finetuning an already pretrained small language model.

Describe the solution you'd like
The distillation loss functions in the TinyBERT paper should be usable when distilling a model in haystack using the distil_from method.

Describe alternatives you've considered
https://arxiv.org/pdf/1910.08381.pdf: Seems to depend too heavily on expensive retraining and seems to be too task specific.
https://arxiv.org/pdf/2002.10957.pdf, https://arxiv.org/pdf/1910.01108.pdf: Seem only to focus on pretraining

Additional context
This is the first of two issues for implementing finetuning as described in the TinyBERT paper. This issue focusses on the loss functions. The second issue focusses on data augmentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants