Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distinguish intermediate layer & prediction layer distillation phases with different parameters #2001

Merged
merged 8 commits into from
Jan 14, 2022

Conversation

MichelBartels
Copy link
Contributor

Currently, you have to use the same learning rate and training data for both stages of tinybert distillation. This behaviour does not match the original TinyBERT paper.

Proposed changes:
Add tinybert_learning_rate and tinybert_train_filename parameters to distil_from method of FARMReader.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can distinguish the two distillation techniques we now have a bit better. Rather than calling one technique tinybert and passing additional parameters to the same method distil_from method, we should think about other options. That includes a draft of how to describe the two options in a documentation guide. Maybe we can have two separate def distil_from() methods where one calls the other or one for 1st stage and one for 2nd stage distillation. Let's discuss in a call.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the two separate methods distil_intermediate_layers_from and distil_prediction_layer_from it's much better structured! LGTM 👍

@julian-risch julian-risch changed the title Add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation distinguish intermediate layer & prediction layer distillation phases with different parameters Jan 14, 2022
@julian-risch julian-risch merged commit 0cca2b9 into master Jan 14, 2022
@julian-risch julian-risch deleted the tinybert_arguments branch January 14, 2022 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants