Add Diverse Paraphrase Generation. #94

ashutoshml · 2021-06-30T08:51:37Z

With support for 4 candidate selection mechanism: DiPS, Diverse Beam, Beam Search, Random Selection

…ion mechanism: DiPS, Diverse Beam, Beam Search, Random Selection

kaustubhdhole · 2021-07-07T13:40:38Z

I think these changes seem like a wondeful addition to NL-Augmenter - I haven't gone through the entire code yet. But you'll need to shift all the dependencies into a separate requirements.txt in the folder "diverse_paraphrase" to avoid conflicts.

Also, unfortunately gensim cannot be added here due to a non-permissive license. Can you use any other substitute?

…n folder, use torchtext instead of gensim

ashutoshml · 2021-07-08T06:47:43Z

Made the necessary changes.

Added requirements.txt in the diverse_paraphrase directory
Replaced gensim word2vec with torchtext GloVe

transformations/diverse_paraphrase/README.md

transformations/diverse_paraphrase/submod/submodopt_intent.py

transformations/diverse_paraphrase/submod/submodopt.py

transformations/diverse_paraphrase/requirements.txt

Nickeilf

Hi @ashutoshml, thanks for contributing this interesting transformation!

Here's my general review:
Correctness: Everything seems correct and good to me.
Interface: I think you've chosen the correct interface.
Specificity: The transformation is not specified to certain types of texts.
Novelty: This transformation is novel and has not been implemented in NL-Augmenter. It generates multiple semantically equal paraphrases via round-trip-translation.
Adding New Libraries: New libraries added in the folder requirements.txt.
Test Cases: 5 test cases have been added.
Evaluating Robustness: Robustness evaluation is not conducted.

The only suggestion I'd make is to add some more comments in the code, as this will improve the readability of the transformation and be more friendly to end users.

ashutoshml · 2021-07-10T03:56:59Z

Hi @Nickeilf

Thanks for the review.
Based on the comments provided, I have updated the code.

transformations/diverse_paraphrase/transformation.py

kaustubhdhole · 2021-07-16T12:41:55Z

transformations/diverse_paraphrase/transformation.py

+        if torch.cuda.is_available():
+            torch.cuda.manual_seed_all(seed)
+
+        assert augmenter in ["dips", "random", "diverse_beam", "beam"]


I think it can be useful for people to see the difference in the outputs of these: You can show 1-2 sentences with different "augmenters" by passing it in args in test.json
Here is an example: /~https://github.com/GEM-benchmark/NL-Augmenter/blob/main/filters/keywords/test.json

Sure. The build seems to be failing because of test accuracy evaluation (test.json).
Any suggestions? Note: The outputs are not deterministic

Reminder. Any suggestions on this?

I had earlier set the variable heavy=True as in back_translation code.

@aadesh11 @AbinayaM02 any idea of this?

@kaustubhdhole : I checked the build failure log. The pytest crashed after it downloaded the necessary libraries (all heavy ones). This looks like an OOM issue on the Github runner, that's my best guess.

The following is the specification of the Github runner,
Hardware specification for Windows and Linux virtual machines:

2-core CPU

7 GB of RAM memory

14 GB of SSD disk space

@ashutoshml can you please try the following in your branch (on your machine):
pytest -s --t=diverse_paraphrase
and share the output.

test/test_main.py:105: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/test_main.py:83: in execute_test_case_for_transformation execute_sentence_operation_test_case(transformation, test) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ transformation = <transformations.diverse_paraphrase.transformation.DiverseParaphrase object at 0x7fdf0499c110> test = {'args': {'augmenter': 'dips'}, 'class': 'DiverseParaphrase', 'inputs': {'sentence': 'Alice in Wonderland is a 2010 Am...0'}, {'sentence': 'Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010'}]} def execute_sentence_operation_test_case(transformation, test): filter_args = test["inputs"] outputs = test["outputs"] perturbs = transformation.generate(**filter_args) for pred_output, output in zip(perturbs, outputs): > assert pred_output == output["sentence"], get_assert_message( transformation, output["sentence"], pred_output ) E AssertionError: Mis-match in expected and predicted output for DiverseParaphrase transformation: E Expected Output: Alice in Wonderland is an American live-action / animated dark fantasy adventure film from the year 2010 E Predicted Output: Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010 E assert 'Alice in Won...vie from 2010' == 'Alice in Won...the year 2010' E - Alice in Wonderland is an American live-action / animated dark fantasy adventure film from the year 2010 E ? ^ ^ ^^ --------- E + Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010 E ? ^ ^^^ ^ test/test_main.py:26: AssertionError =============================================================================================================================== warnings summary =============================================================================================================================== test/test_main.py::test_operation[diverse_paraphrase-length] /scratche/prod/ashutosh/miniconda3/envs/nlgenv/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) -- Docs: https://docs.pytest.org/en/stable/warnings.html =========================================================================================================================== short test summary info ============================================================================================================================ FAILED test/test_main.py::test_operation[diverse_paraphrase-length] - AssertionError: Mis-match in expected and predicted output for DiverseParaphrase transformation: =================================================================================================================== 1 failed, 1 warning in 288.53s (0:04:48) ===================================================================================================================

As mentioned earlier, the test cases will fail because of the probabilistic nature of the outputs. It is looking for exact matches .

Okay, which part is the probabilistic part? (I mean which line)

There are three ways in which probabilistic outputs may happen:

When the candidates are generated using beam search, the argmax operation is resolved non-deterministically in the case of two candidates having the same max-scores.

For dips, there is a functional maximization operation that occurs resulting in similar non-determinism. The first selected candidate governs how the rest of the candidates are selected. Since the objective is to maximize the lexical-syntactical diversity of the output candidates.

In random selection, as the name suggests, random candidates are selected from the beam search, which may or may not produce diverse candidates.

kaustubhdhole · 2021-08-02T15:21:11Z

transformations/diverse_paraphrase/submod/submodular_funcs.py

+    global model
+    def unk_init(x):
+        return torch.randn_like(x)
+    model = GloVe('6B', unk_init=unk_init)


A lot of things seem to be consuming memory here - this can be reduced to 50 dim instead of the default 300.

AbinayaM02 · 2021-08-09T14:00:02Z

Hi @ashutoshml : Since the PR uses some memory heavy libraries, the build is failing. I'm looking at options to resolve this smoothly. It might take while!

kaustubhdhole · 2021-08-25T22:48:19Z

Okay, I am merging this while disabling the tests for this PR.

kaustubhdhole · 2021-08-25T23:00:26Z

Hi @ashutoshml, I would suggest to create a separate PR so that your transformation can be widely used. :) Merging this now after @Nickeilf approves.

ashutoshml · 2021-08-26T03:43:01Z

Hi @ashutoshml, I would suggest to create a separate PR so that your transformation can be widely used. :) Merging this now after @Nickeilf approves.

Hi @kaustubhdhole. Thanks for the merge.
Can you please elaborate on the separate PR part? I didn't quite understand.

kaustubhdhole · 2021-08-26T16:56:36Z

I mean it is important to look at how the memory issue is solved, so that the test cases do not fail. A separate PR should address that and add back test cases (which I removed temporarily).

Add Diverse Paraphrase Generation with support for 4 candidate select…

eabe2bb

…ion mechanism: DiPS, Diverse Beam, Beam Search, Random Selection

kaustubhdhole added the transformation label Jun 30, 2021

kaustubhdhole requested a review from Nickeilf July 7, 2021 13:40

ashutoshml added 2 commits July 8, 2021 12:02

Update code based on comments; move requirements.txt to transformatio…

acf43a1

…n folder, use torchtext instead of gensim

Update main requirements.txt file

ae69481

ashutoshml added 2 commits July 8, 2021 12:32

Resolve typo in requirements.txt

78b356e

Remove package version

6c0a4e6