Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Diverse Paraphrase Generation. #94

Merged
merged 12 commits into from
Aug 26, 2021

Conversation

ashutoshml
Copy link
Contributor

With support for 4 candidate selection mechanism: DiPS, Diverse Beam, Beam Search, Random Selection

…ion mechanism: DiPS, Diverse Beam, Beam Search, Random Selection
@kaustubhdhole
Copy link
Collaborator

I think these changes seem like a wondeful addition to NL-Augmenter - I haven't gone through the entire code yet. But you'll need to shift all the dependencies into a separate requirements.txt in the folder "diverse_paraphrase" to avoid conflicts.

Also, unfortunately gensim cannot be added here due to a non-permissive license. Can you use any other substitute?

@kaustubhdhole kaustubhdhole requested a review from Nickeilf July 7, 2021 13:40
@ashutoshml
Copy link
Contributor Author

Made the necessary changes.

  1. Added requirements.txt in the diverse_paraphrase directory
  2. Replaced gensim word2vec with torchtext GloVe

Copy link
Collaborator

@Nickeilf Nickeilf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashutoshml, thanks for contributing this interesting transformation!

Here's my general review:
Correctness: Everything seems correct and good to me.
Interface: I think you've chosen the correct interface.
Specificity: The transformation is not specified to certain types of texts.
Novelty: This transformation is novel and has not been implemented in NL-Augmenter. It generates multiple semantically equal paraphrases via round-trip-translation.
Adding New Libraries: New libraries added in the folder requirements.txt.
Test Cases: 5 test cases have been added.
Evaluating Robustness: Robustness evaluation is not conducted.

The only suggestion I'd make is to add some more comments in the code, as this will improve the readability of the transformation and be more friendly to end users.

@ashutoshml ashutoshml requested a review from Nickeilf July 10, 2021 03:19
@ashutoshml
Copy link
Contributor Author

Hi @Nickeilf

Thanks for the review.
Based on the comments provided, I have updated the code.

@ashutoshml ashutoshml requested a review from kaustubhdhole July 11, 2021 11:37
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)

assert augmenter in ["dips", "random", "diverse_beam", "beam"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be useful for people to see the difference in the outputs of these: You can show 1-2 sentences with different "augmenters" by passing it in args in test.json
Here is an example: /~https://github.com/GEM-benchmark/NL-Augmenter/blob/main/filters/keywords/test.json

Copy link
Contributor Author

@ashutoshml ashutoshml Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The build seems to be failing because of test accuracy evaluation (test.json).
Any suggestions? Note: The outputs are not deterministic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder. Any suggestions on this?

I had earlier set the variable heavy=True as in back_translation code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aadesh11 @AbinayaM02 any idea of this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaustubhdhole : I checked the build failure log. The pytest crashed after it downloaded the necessary libraries (all heavy ones). This looks like an OOM issue on the Github runner, that's my best guess.

The following is the specification of the Github runner,
Hardware specification for Windows and Linux virtual machines:

  • 2-core CPU
  • 7 GB of RAM memory
  • 14 GB of SSD disk space

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashutoshml can you please try the following in your branch (on your machine):
pytest -s --t=diverse_paraphrase
and share the output.

Copy link
Contributor Author

@ashutoshml ashutoshml Jul 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/test_main.py:105:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/test_main.py:83: in execute_test_case_for_transformation
    execute_sentence_operation_test_case(transformation, test)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

transformation = <transformations.diverse_paraphrase.transformation.DiverseParaphrase object at 0x7fdf0499c110>
test = {'args': {'augmenter': 'dips'}, 'class': 'DiverseParaphrase', 'inputs': {'sentence': 'Alice in Wonderland is a 2010 Am...0'}, {'sentence': 'Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010'}]}

    def execute_sentence_operation_test_case(transformation, test):
        filter_args = test["inputs"]
        outputs = test["outputs"]
        perturbs = transformation.generate(**filter_args)
        for pred_output, output in zip(perturbs, outputs):
>           assert pred_output == output["sentence"], get_assert_message(
                transformation, output["sentence"], pred_output
            )
E           AssertionError: Mis-match in expected and predicted output for DiverseParaphrase transformation:
E              Expected Output: Alice in Wonderland is an American live-action / animated dark fantasy adventure film from the year 2010
E              Predicted Output: Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010
E           assert 'Alice in Won...vie from 2010' == 'Alice in Won...the year 2010'
E             - Alice in Wonderland is an American live-action / animated dark fantasy adventure film from the year 2010
E             ?                                        ^                                         ^ ^^      ---------
E             + Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010
E             ?                                        ^                                         ^^^ ^

test/test_main.py:26: AssertionError
=============================================================================================================================== warnings summary ===============================================================================================================================
test/test_main.py::test_operation[diverse_paraphrase-length]
  /scratche/prod/ashutosh/miniconda3/envs/nlgenv/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
  To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
    return torch.floor_divide(self, other)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================================================================================================================== short test summary info ============================================================================================================================
FAILED test/test_main.py::test_operation[diverse_paraphrase-length] - AssertionError: Mis-match in expected and predicted output for DiverseParaphrase transformation:
=================================================================================================================== 1 failed, 1 warning in 288.53s (0:04:48) ===================================================================================================================

As mentioned earlier, the test cases will fail because of the probabilistic nature of the outputs. It is looking for exact matches .

Copy link
Collaborator

@kaustubhdhole kaustubhdhole Jul 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, which part is the probabilistic part? (I mean which line)

Copy link
Contributor Author

@ashutoshml ashutoshml Jul 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three ways in which probabilistic outputs may happen:

  1. When the candidates are generated using beam search, the argmax operation is resolved non-deterministically in the case of two candidates having the same max-scores.

  2. For dips, there is a functional maximization operation that occurs resulting in similar non-determinism. The first selected candidate governs how the rest of the candidates are selected. Since the objective is to maximize the lexical-syntactical diversity of the output candidates.

  3. In random selection, as the name suggests, random candidates are selected from the beam search, which may or may not produce diverse candidates.

@ashutoshml ashutoshml requested a review from kaustubhdhole July 17, 2021 07:34
@ashutoshml ashutoshml requested a review from AbinayaM02 July 31, 2021 14:02
global model
def unk_init(x):
return torch.randn_like(x)
model = GloVe('6B', unk_init=unk_init)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of things seem to be consuming memory here - this can be reduced to 50 dim instead of the default 300.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@AbinayaM02
Copy link
Collaborator

Hi @ashutoshml : Since the PR uses some memory heavy libraries, the build is failing. I'm looking at options to resolve this smoothly. It might take while!

@kaustubhdhole
Copy link
Collaborator

Okay, I am merging this while disabling the tests for this PR.

@kaustubhdhole
Copy link
Collaborator

kaustubhdhole commented Aug 25, 2021

Hi @ashutoshml, I would suggest to create a separate PR so that your transformation can be widely used. :) Merging this now after @Nickeilf approves.

@ashutoshml
Copy link
Contributor Author

Hi @ashutoshml, I would suggest to create a separate PR so that your transformation can be widely used. :) Merging this now after @Nickeilf approves.

Hi @kaustubhdhole. Thanks for the merge.
Can you please elaborate on the separate PR part? I didn't quite understand.

@kaustubhdhole
Copy link
Collaborator

I mean it is important to look at how the memory issue is solved, so that the test cases do not fail. A separate PR should address that and add back test cases (which I removed temporarily).

@kaustubhdhole kaustubhdhole merged commit a8e4e49 into GEM-benchmark:main Aug 26, 2021
@ashutoshml ashutoshml deleted the diverse_paraphrase branch August 27, 2021 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants