Run checklist suites in AllenNLP #5065

AkshitaB · 2021-03-22T14:12:42Z

TaskSuite base class and tests
Implement classes for 3 tasks
Command line functionality

Suites for 3 tasks (pkl files and scripts):

Sentiment Analysis
Squad QA
Textual Entailment

Run suites for models

Sentiment Analysis
Squad QA
Textual Entailment

Note:

At first, it may seem like a lot of this functionality can simply be a part of the Predictor class. But predictors can be more generic. Eg. text-classification-predictor can be used for sentiment analysis as well as textual-entailment. The tests in the test suites are supposed to be more task-oriented, however, and cannot be used interchangeably.

…checklist

AkshitaB · 2021-04-02T15:51:47Z

@dirkgr @epwalsh This is ready for a look. I'll be adding more default tests for the QA suite, as well as unit tests (they require models from allennlp-models, so perhaps that's where the suite pkl files should be tested?).

allennlp/common/testing/checklist_test.py

allennlp/sanity_checks/task_checklists/question_answering_suite.py

dirkgr

I have not looked at the tests yet, but here is a first review. It seems like there is a lot of copy and paste code that's not following the guidelines (as opposed to enforced rules) of AllenNLP code quality, like missing type annotations etc.?

allennlp/sanity_checks/task_checklists/question_answering_suite.py

allennlp/commands/checklist.py

dirkgr · 2021-04-08T17:28:41Z

allennlp/commands/checklist.py

+        if capabilities:
+            self._print_summary_args["capabilities"] = capabilities


Do you want this to print when capabilities == []?

_print_summary_args are just passed to the summary function. When capabilities == [], the summary for tests of all capabilities are printed.

allennlp/sanity_checks/task_checklists/textual_entailment_suite.py

allennlp/sanity_checks/task_checklists/utils.py

dirkgr · 2021-04-09T01:45:57Z

allennlp/sanity_checks/task_checklists/utils.py

+    ret = []
+    if s != data:
+        ret.append(s)
+    if s + "." != data:
+        ret.append(s + ".")
+    return ret


Why not just return [s] or return [s + "."]?

Here's what it's doing:

data = "This was great!"
Returns ["This was great", "This was great."]

data = "The movie was good"
Returns ["The movie was good."]

epwalsh

I'm having trouble getting this to work with a QA model.

I tried running

 allennlp checklist \
    https://storage.googleapis.com/allennlp-public-models/transformer-qa-2020-10-03.tar.gz \
    question-answering

But got no output (no errors, either). I also tried the above with the --checklist-suite parameter pointed to a download and extracted version of /~https://github.com/marcotcr/checklist/blob/master/release_suites/squad_suite.tar.gz, but I got errors from dill:

CHANGELOG.md

allennlp/commands/checklist.py

epwalsh · 2021-04-12T21:06:41Z

allennlp/commands/checklist.py

+
+        subparser.add_argument("task", type=str, help="the name of the task suite")
+
+        subparser.add_argument("--checklist-suite", type=str, help="the checklist suite path")


Where do we find these? Is this /~https://github.com/marcotcr/checklist/tree/master/release_suites?

We will add our default suites to cloud. And people can also write their own suites.

allennlp/sanity_checks/task_checklists/task_suite.py

AkshitaB · 2021-04-19T06:00:07Z

@epwalsh That was because the tests were not being added by default. I've now switched the flag, so you should be able to run the command now. (It will only run 2 tests right now; I didn't want this PR to have more lines of code).

epwalsh

Just a few "cosmetic" comments. I'm working my way through some examples next so I'll followup in a bit with another review

allennlp/common/testing/checklist_test.py

allennlp/sanity_checks/task_checklists/task_suite.py

CHANGELOG.md

setup.py

Co-authored-by: Pete <petew@allenai.org>

…checklist

epwalsh

This is really cool!

The only other comment I have is about the format of the summary. It's just a little difficult to tell apart "capabilities" from specific test names. I guess if you know what you're looking at, it may be obvious. I just had a little trouble the first time I looked at it.

If it's difficult to override the default summary, I wouldn't worry about.

allennlp/sanity_checks/task_checklists/textual_entailment_suite.py

AkshitaB · 2021-04-24T02:38:11Z

@epwalsh Thanks for reviewing a mountain of code!

I've added custom formatting functions now. Here's a preview of what that looks like (task-specific):

epwalsh

Wow, that looks really great! Much easier to read!

I'm just curious, is there a built-in way to get a structured version of the output? Like in JSON format, for example? That would be a cool feature to have, especially for people doing continuous deployment of models, because they could programmatically run the checklists suites and a get a pass/fail result.

* run checklist suites from command line * specify output file * separate task from checklist suite * qa task * adding describe, misc updates * fix docs, TE suite * update changelog * bug fix * adding default tests * qa defaults * typing, docs, minor updates * more updates * set add_default_tests to True * remove commented lines * capitalizing help strings * does this work * adding start_method to test * skipping test * oops, actually fix * temp fix to check memory issues * Skip more memory hungry tests * fix * fixing professions * Update setup.py Co-authored-by: Pete <petew@allenai.org> * Update CHANGELOG.md Co-authored-by: Pete <petew@allenai.org> * Update allennlp/sanity_checks/task_checklists/task_suite.py Co-authored-by: Pete <petew@allenai.org> * formatting functions Co-authored-by: Evan Pete Walsh <petew@allenai.org>

AkshitaB added 2 commits March 22, 2021 19:41

run checklist suites from command line

03ed962

Merge branch 'main' into checklist

9298344

AkshitaB marked this pull request as draft March 22, 2021 14:13

AkshitaB added 11 commits March 23, 2021 18:23

specify output file

4de66d9

separate task from checklist suite

b297a5e

qa task

e7c28ec

adding describe, misc updates

834da9f

Merge branch 'main' into checklist

7ec15df

fix docs, TE suite

4a72ee4

Merge branch 'checklist' of /~https://github.com/allenai/allennlp into …

e081d86

…checklist

update changelog

6d0a848

bug fix

a539927

adding default tests

793e1d4

Merge branch 'main' into checklist

1f30e0b

AkshitaB marked this pull request as ready for review April 2, 2021 15:49

AkshitaB requested review from epwalsh and dirkgr and removed request for epwalsh April 2, 2021 15:49

AkshitaB and others added 3 commits April 5, 2021 20:47

Merge branch 'main' into checklist

b3e39af

qa defaults

a7ee03a

Merge branch 'main' into checklist

6ca2366

epwalsh reviewed Apr 8, 2021

View reviewed changes

allennlp/common/testing/checklist_test.py Show resolved Hide resolved

epwalsh reviewed Apr 8, 2021

View reviewed changes

allennlp/sanity_checks/task_checklists/question_answering_suite.py Outdated Show resolved Hide resolved

dirkgr suggested changes Apr 9, 2021

View reviewed changes

AkshitaB added 2 commits April 12, 2021 00:33

typing, docs, minor updates

c7ba6a9

more updates

9ce113e

AkshitaB requested a review from dirkgr April 12, 2021 08:57

epwalsh suggested changes Apr 12, 2021

View reviewed changes

set add_default_tests to True

72d2058

AkshitaB added 2 commits April 18, 2021 22:54

remove commented lines

309e8f6

capitalizing help strings

8cdfd9b

AkshitaB requested a review from epwalsh April 19, 2021 06:00

AkshitaB added 9 commits April 19, 2021 11:16

Merge branch 'main' into checklist

74088b5

Merge branch 'main' into checklist

aae66fe

does this work

867ed01

Merge branch 'main' into checklist

040ef6c

adding start_method to test

24aed60

skipping test

c75c589

oops, actually fix

b639f70

temp fix to check memory issues

27d6dc9

Skip more memory hungry tests

cad47a9

epwalsh reviewed Apr 23, 2021

View reviewed changes

allennlp/common/testing/checklist_test.py Show resolved Hide resolved

allennlp/sanity_checks/task_checklists/task_suite.py Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

epwalsh reviewed Apr 23, 2021

View reviewed changes

setup.py Outdated Show resolved Hide resolved

AkshitaB and others added 7 commits April 23, 2021 15:27

fix

7fa016f

fixing professions

8313e44

Update setup.py

3d75393

Co-authored-by: Pete <petew@allenai.org>

Update CHANGELOG.md

dff7df6

Co-authored-by: Pete <petew@allenai.org>

Update allennlp/sanity_checks/task_checklists/task_suite.py

99f6ab7

Co-authored-by: Pete <petew@allenai.org>

Merge branch 'main' into checklist

567d797

Merge branch 'checklist' of /~https://github.com/allenai/allennlp into …

a084d70

…checklist

epwalsh reviewed Apr 23, 2021

View reviewed changes

allennlp/sanity_checks/task_checklists/textual_entailment_suite.py Outdated Show resolved Hide resolved

formatting functions

ab251a0

AkshitaB requested a review from epwalsh April 24, 2021 02:39

epwalsh approved these changes Apr 24, 2021

View reviewed changes

AkshitaB merged commit 10400e0 into main Apr 24, 2021

AkshitaB deleted the checklist branch April 24, 2021 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run checklist suites in AllenNLP #5065

Run checklist suites in AllenNLP #5065

AkshitaB commented Mar 22, 2021 •

edited

Loading

AkshitaB commented Apr 2, 2021 •

edited

Loading

dirkgr left a comment

dirkgr Apr 8, 2021

AkshitaB Apr 12, 2021

dirkgr Apr 9, 2021

AkshitaB Apr 12, 2021

epwalsh left a comment

epwalsh Apr 12, 2021

AkshitaB Apr 19, 2021

AkshitaB commented Apr 19, 2021

epwalsh left a comment

epwalsh left a comment

AkshitaB commented Apr 24, 2021 •

edited

Loading

epwalsh left a comment

		if capabilities:
		self._print_summary_args["capabilities"] = capabilities


		subparser.add_argument("task", type=str, help="the name of the task suite")

		subparser.add_argument("--checklist-suite", type=str, help="the checklist suite path")

Run checklist suites in AllenNLP #5065

Run checklist suites in AllenNLP #5065

Conversation

AkshitaB commented Mar 22, 2021 • edited Loading

AkshitaB commented Apr 2, 2021 • edited Loading

dirkgr left a comment

Choose a reason for hiding this comment

dirkgr Apr 8, 2021

Choose a reason for hiding this comment

AkshitaB Apr 12, 2021

Choose a reason for hiding this comment

dirkgr Apr 9, 2021

Choose a reason for hiding this comment

AkshitaB Apr 12, 2021

Choose a reason for hiding this comment

epwalsh left a comment

Choose a reason for hiding this comment

epwalsh Apr 12, 2021

Choose a reason for hiding this comment

AkshitaB Apr 19, 2021

Choose a reason for hiding this comment

AkshitaB commented Apr 19, 2021

epwalsh left a comment

Choose a reason for hiding this comment

epwalsh left a comment

Choose a reason for hiding this comment

AkshitaB commented Apr 24, 2021 • edited Loading

epwalsh left a comment

Choose a reason for hiding this comment

AkshitaB commented Mar 22, 2021 •

edited

Loading

AkshitaB commented Apr 2, 2021 •

edited

Loading

AkshitaB commented Apr 24, 2021 •

edited

Loading