Adding distillation loss functions from TinyBERT #1879

MichelBartels · 2021-12-13T18:09:34Z

Proposed changes:
This adds the distillation loss functions from TinyBERT as explained in #1873.

Status (please check what you already did):

First draft (up for discussions & feedback)
Final code
Added tests
Updated documentation

This adds two parameters to the distil_from method. Enabling the parameter tinybert_loss adds an additional distillation stage before the original one. tinybert_epochs specifies the number of epochs in this stage. The stage is realised using a new TinyBERTDistillationTrainer that computes the teacher hidden states and attention on the fly.
Caching of the teacher is not used as this would take up too much memory (100s to 1000s of gigabytes). This means that the standard DistillationTrainer can be used.

into tinybertdistill

julian-risch

Looks very good already. Just some smaller changes requested. Most interesting for you is the missing return keyword in haystack/nodes/reader/farm.py I guess. Happy to jump on a quick call in the afternoon if you want to discuss something.

julian-risch · 2021-12-23T07:24:53Z

test/test_distillation.py

+def test_tinybert_distillation():
+    student = FARMReader(model_name_or_path="huawei-noah/TinyBERT_General_4L_312D")
+    teacher = FARMReader(model_name_or_path="bert-base-uncased")


Could we use as smaller teacher model here to speed up the test?

This would be theoretically possible, but the teacher model would need to have the exact same dimensions except for the number of layers which would need to be a multiple of the number of student layers. This means it is quite hard to find a matching model. If it is a big performance issue we could perhaps create our own "mock model" with the right parameters.

Alright, let's keep it as it is for now. 👍

julian-risch · 2021-12-23T07:27:25Z

haystack/nodes/reader/farm.py

        :return: None
        """
+        if tinybert_loss:
+            self._training_procedure(data_dir=data_dir, train_filename=train_filename,


Is a return missing here in front of self._training_procedure(...?

No, task specific distillation for TinyBERT has two stages and the second stage is the same as what we have already implemented. So calling _training_procedure with tinybert=True only executes the first stage. I have added a short comment explaining that.

julian-risch · 2021-12-23T08:12:23Z

haystack/modeling/training/base.py

@@ -1,9 +1,6 @@
 from typing import Optional, Union, Tuple, List, Callable

-from typing import TYPE_CHECKING


Could you please explain why we got rid of these lines so that I understand a bit better? _LRScheduleris A response to this comment would be fine. :)

The type hint for DistillationTrainer turned out to be wrong. Because of that I don't need to import TYPE_CHECKING anymore as it's just necessary for preventing the circular import of FARMReader. There was never really a reason to also use that for _LRScheduler so `_LRScheduler can just be imported normally.

julian-risch · 2021-12-23T08:14:14Z

haystack/modeling/training/base.py

@@ -630,7 +627,7 @@ class DistillationTrainer(Trainer):
    """
    def __init__(
        self,
-        model: "FARMReader",
+        model: "AdaptiveModel",


Is the code ready to use other models than FARMReader in its current form?

It can basically train any AdaptiveModel with a QA prediction_head. I changed this line because I realised that _training_procedure only passes the AdaptiveModel. This behavior is exactly the same for the normal Trainer class.

julian-risch · 2021-12-23T08:19:05Z

haystack/modeling/model/language_model.py

@@ -484,6 +484,8 @@ def forward(
        input_ids: torch.Tensor,
        segment_ids: torch.Tensor,
        padding_mask: torch.Tensor,
+        output_hidden_states: bool = False,
+        output_attentions: bool = False,


Please add docstrings for these new parameters

Ok, I have added the docstrings.

julian-risch · 2021-12-23T08:20:54Z

haystack/modeling/model/language_model.py

-            sequence_output, pooled_output = output_tuple[0], output_tuple[1]
-            return sequence_output, pooled_output
+        return output_tuple
+#        if self.model.encoder.config.output_hidden_states == True:


Please check the commented code. :)

I have now deleted the commented code. It is unnecessary as output tuple is now handled by HuggingFace transformers.

into tinybertdistill

julian-risch · 2021-12-23T13:06:18Z

haystack/modeling/model/adaptive_model.py

@@ -356,7 +356,7 @@ def prepare_labels(self, **kwargs):
            all_labels.append(labels)
        return all_labels

-    def forward(self, **kwargs):
+    def forward(self, output_hidden_states: bool = False, output_attentions: bool = False, **kwargs):


Please add the doc strings for the new parameters here as well, e.g.:

:param output_hidden_states: Whether to output hidden states :param output_attentions: Whether to output attentions

I have added the doc strings.

MichelBartels and others added 5 commits December 13, 2021 16:23

initial tinybertdistill commit

5b20379

add tinybert distill loss

bb07c8c

remove teacher caching for tinybert

f983c12

add tinybert to distil_from method

aa10c19

Add latest docstring and tutorial changes

d5a7230

MichelBartels requested a review from julian-risch December 14, 2021 16:08

MichelBartels and others added 7 commits December 20, 2021 16:03

add dim mapping and fix type hints

0481952

Merge branch 'tinybertdistill' of /~https://github.com/deepset-ai/haystack

449ff7e

into tinybertdistill

fix type hints

0bf2bfc

fix dummy input

970b512

fix dim mapping for tinybert loss and add comments/doc strings

2a85647

add test for tinybert loss

3f468b6

Add latest docstring and tutorial changes

0cccd75

MichelBartels marked this pull request as ready for review December 22, 2021 15:15

julian-risch requested changes Dec 23, 2021

View reviewed changes

MichelBartels added 3 commits December 23, 2021 09:35

add comment

f8a7701

Merge branch 'tinybertdistill' of /~https://github.com/deepset-ai/haystack

9163f35

into tinybertdistill

fix BERT forward parameters

faedf8b

MichelBartels requested a review from julian-risch December 23, 2021 09:54

julian-risch approved these changes Dec 23, 2021

View reviewed changes

MichelBartels added 3 commits December 23, 2021 14:18

add doc string to AdaptiveModel forward method

0a095e1

remove unnecessary data silo

8798c3e

fix farm import

f30496b

MichelBartels merged commit f33c2b9 into master Dec 23, 2021

MichelBartels deleted the tinybertdistill branch December 23, 2021 13:54

julian-risch mentioned this pull request Jan 4, 2022

Implementing distillation loss functions from TinyBERT #1873

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding distillation loss functions from TinyBERT #1879

Adding distillation loss functions from TinyBERT #1879

MichelBartels commented Dec 13, 2021 •

edited

Loading

julian-risch left a comment

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

julian-risch Dec 23, 2021

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

julian-risch Dec 23, 2021

MichelBartels Dec 23, 2021

		@@ -1,9 +1,6 @@
		from typing import Optional, Union, Tuple, List, Callable

		from typing import TYPE_CHECKING

Adding distillation loss functions from TinyBERT #1879

Adding distillation loss functions from TinyBERT #1879

Conversation

MichelBartels commented Dec 13, 2021 • edited Loading

julian-risch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichelBartels commented Dec 13, 2021 •

edited

Loading