-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Diverse Paraphrase Generation. #94
Add Diverse Paraphrase Generation. #94
Conversation
…ion mechanism: DiPS, Diverse Beam, Beam Search, Random Selection
I think these changes seem like a wondeful addition to NL-Augmenter - I haven't gone through the entire code yet. But you'll need to shift all the dependencies into a separate requirements.txt in the folder "diverse_paraphrase" to avoid conflicts. Also, unfortunately gensim cannot be added here due to a non-permissive license. Can you use any other substitute? |
…n folder, use torchtext instead of gensim
Made the necessary changes.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ashutoshml, thanks for contributing this interesting transformation!
Here's my general review:
Correctness: Everything seems correct and good to me.
Interface: I think you've chosen the correct interface.
Specificity: The transformation is not specified to certain types of texts.
Novelty: This transformation is novel and has not been implemented in NL-Augmenter. It generates multiple semantically equal paraphrases via round-trip-translation.
Adding New Libraries: New libraries added in the folder requirements.txt.
Test Cases: 5 test cases have been added.
Evaluating Robustness: Robustness evaluation is not conducted.
The only suggestion I'd make is to add some more comments in the code, as this will improve the readability of the transformation and be more friendly to end users.
Hi @Nickeilf Thanks for the review. |
if torch.cuda.is_available(): | ||
torch.cuda.manual_seed_all(seed) | ||
|
||
assert augmenter in ["dips", "random", "diverse_beam", "beam"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be useful for people to see the difference in the outputs of these: You can show 1-2 sentences with different "augmenters" by passing it in args in test.json
Here is an example: /~https://github.com/GEM-benchmark/NL-Augmenter/blob/main/filters/keywords/test.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. The build seems to be failing because of test accuracy evaluation (test.json).
Any suggestions? Note: The outputs are not deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder. Any suggestions on this?
I had earlier set the variable heavy=True
as in back_translation code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aadesh11 @AbinayaM02 any idea of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaustubhdhole : I checked the build failure log. The pytest crashed after it downloaded the necessary libraries (all heavy ones). This looks like an OOM issue on the Github runner, that's my best guess.
The following is the specification of the Github runner,
Hardware specification for Windows and Linux virtual machines:
- 2-core CPU
- 7 GB of RAM memory
- 14 GB of SSD disk space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashutoshml can you please try the following in your branch (on your machine):
pytest -s --t=diverse_paraphrase
and share the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/test_main.py:105:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/test_main.py:83: in execute_test_case_for_transformation
execute_sentence_operation_test_case(transformation, test)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
transformation = <transformations.diverse_paraphrase.transformation.DiverseParaphrase object at 0x7fdf0499c110>
test = {'args': {'augmenter': 'dips'}, 'class': 'DiverseParaphrase', 'inputs': {'sentence': 'Alice in Wonderland is a 2010 Am...0'}, {'sentence': 'Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010'}]}
def execute_sentence_operation_test_case(transformation, test):
filter_args = test["inputs"]
outputs = test["outputs"]
perturbs = transformation.generate(**filter_args)
for pred_output, output in zip(perturbs, outputs):
> assert pred_output == output["sentence"], get_assert_message(
transformation, output["sentence"], pred_output
)
E AssertionError: Mis-match in expected and predicted output for DiverseParaphrase transformation:
E Expected Output: Alice in Wonderland is an American live-action / animated dark fantasy adventure film from the year 2010
E Predicted Output: Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010
E assert 'Alice in Won...vie from 2010' == 'Alice in Won...the year 2010'
E - Alice in Wonderland is an American live-action / animated dark fantasy adventure film from the year 2010
E ? ^ ^ ^^ ---------
E + Alice in Wonderland is an American live action / animated dark fantasy adventure movie from 2010
E ? ^ ^^^ ^
test/test_main.py:26: AssertionError
=============================================================================================================================== warnings summary ===============================================================================================================================
test/test_main.py::test_operation[diverse_paraphrase-length]
/scratche/prod/ashutosh/miniconda3/envs/nlgenv/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================================================================================================================== short test summary info ============================================================================================================================
FAILED test/test_main.py::test_operation[diverse_paraphrase-length] - AssertionError: Mis-match in expected and predicted output for DiverseParaphrase transformation:
=================================================================================================================== 1 failed, 1 warning in 288.53s (0:04:48) ===================================================================================================================
As mentioned earlier, the test cases will fail because of the probabilistic nature of the outputs. It is looking for exact matches .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, which part is the probabilistic part? (I mean which line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three ways in which probabilistic outputs may happen:
-
When the candidates are generated using beam search, the argmax operation is resolved non-deterministically in the case of two candidates having the same max-scores.
-
For
dips
, there is a functional maximization operation that occurs resulting in similar non-determinism. The first selected candidate governs how the rest of the candidates are selected. Since the objective is to maximize the lexical-syntactical diversity of the output candidates. -
In
random
selection, as the name suggests, random candidates are selected from the beam search, which may or may not produce diverse candidates.
global model | ||
def unk_init(x): | ||
return torch.randn_like(x) | ||
model = GloVe('6B', unk_init=unk_init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of things seem to be consuming memory here - this can be reduced to 50 dim instead of the default 300.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
Hi @ashutoshml : Since the PR uses some memory heavy libraries, the build is failing. I'm looking at options to resolve this smoothly. It might take while! |
Okay, I am merging this while disabling the tests for this PR. |
Hi @ashutoshml, I would suggest to create a separate PR so that your transformation can be widely used. :) Merging this now after @Nickeilf approves. |
Hi @kaustubhdhole. Thanks for the merge. |
I mean it is important to look at how the memory issue is solved, so that the test cases do not fail. A separate PR should address that and add back test cases (which I removed temporarily). |
With support for 4 candidate selection mechanism: DiPS, Diverse Beam, Beam Search, Random Selection