Implementing distillation loss functions from TinyBERT #1873

MichelBartels · 2021-12-13T10:43:27Z

Is your feature request related to a problem? Please describe.
A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (https://arxiv.org/pdf/1909.10351.pdf) details an approach for finetuning an already pretrained small language model.

Describe the solution you'd like
The distillation loss functions in the TinyBERT paper should be usable when distilling a model in haystack using the distil_from method.

Describe alternatives you've considered
https://arxiv.org/pdf/1910.08381.pdf: Seems to depend too heavily on expensive retraining and seems to be too task specific.
https://arxiv.org/pdf/2002.10957.pdf, https://arxiv.org/pdf/1910.01108.pdf: Seem only to focus on pretraining

Additional context
This is the first of two issues for implementing finetuning as described in the TinyBERT paper. This issue focusses on the loss functions. The second issue focusses on data augmentation.

The text was updated successfully, but these errors were encountered:

julian-risch · 2022-01-04T15:05:00Z

closed by #1879

MichelBartels added topic:speed topic:modeling labels Dec 13, 2021

MichelBartels self-assigned this Dec 13, 2021

MichelBartels mentioned this issue Dec 13, 2021

Adding distillation loss functions from TinyBERT #1879

Merged

4 tasks

julian-risch mentioned this issue Dec 14, 2021

Model compression: knowledge distillation #1551

Closed

7 tasks

julian-risch closed this as completed Jan 4, 2022

MichelBartels mentioned this issue Jan 4, 2022

Add TinyBERT data augmentation #1923

Merged

4 tasks

julian-risch mentioned this issue Jan 4, 2022

Create distilled models based on TinyBERT #1947

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing distillation loss functions from TinyBERT #1873

Implementing distillation loss functions from TinyBERT #1873

MichelBartels commented Dec 13, 2021

julian-risch commented Jan 4, 2022

Implementing distillation loss functions from TinyBERT #1873

Implementing distillation loss functions from TinyBERT #1873

Comments

MichelBartels commented Dec 13, 2021

julian-risch commented Jan 4, 2022