Implementing data augmentation from TinyBERT #1874

MichelBartels · 2021-12-13T10:46:48Z

Is your feature request related to a problem? Please describe.
A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (https://arxiv.org/pdf/1909.10351.pdf) details an approach for finetuning an already pretrained small language model.

Describe the solution you'd like
Adding the functionality to generate more data samples by using approach outlined in TinyBERT paper. This could be implemented as an additional DataSilo.

Describe alternatives you've considered
https://arxiv.org/pdf/1910.08381.pdf: Seems to depend too heavily on expensive retraining and seems to be too task specific.
https://arxiv.org/pdf/2002.10957.pdf, https://arxiv.org/pdf/1910.01108.pdf: Seem only to focus on pretraining

Additional context
This is the second of two issues for implementing finetuning as described in the TinyBERT paper. This issue focusses on data augmentation. The first issue focussed on the loss functions.

MichelBartels added topic:speed topic:modeling labels Dec 13, 2021

MichelBartels self-assigned this Dec 13, 2021

julian-risch mentioned this issue Dec 14, 2021

Model compression: knowledge distillation #1551

Closed

7 tasks

MichelBartels mentioned this issue Dec 23, 2021

Add TinyBERT data augmentation #1923

Merged

4 tasks

julian-risch mentioned this issue Jan 4, 2022

Create distilled models based on TinyBERT #1947

Closed

julian-risch self-assigned this Jan 4, 2022

manisnesan mentioned this issue Jan 4, 2022

Dataaugmentation approach from tinybert manisnesan/til#7

Open

MichelBartels closed this as completed in #1923 Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing data augmentation from TinyBERT #1874

Implementing data augmentation from TinyBERT #1874

MichelBartels commented Dec 13, 2021 •

edited by julian-risch

Loading

Implementing data augmentation from TinyBERT #1874

Implementing data augmentation from TinyBERT #1874

Comments

MichelBartels commented Dec 13, 2021 • edited by julian-risch Loading

MichelBartels commented Dec 13, 2021 •

edited by julian-risch

Loading