quoref metric and evaluator #3153

pdasigi · 2019-08-13T21:07:35Z

No description provided.

pdasigi · 2019-08-19T22:23:45Z

@matt-gardner this is ready to be reviewed. Should be pretty straightforward as the quoref metric depends heavily on the drop metric, with just the data formats being different. I think this is the last needed piece for the leaderboard.

allennlp/training/metrics/quoref_em_and_f1.py

matt-gardner · 2019-08-20T18:36:51Z

allennlp/tools/quoref_eval.py

+                answers_dict[query_id] = candidate_answers
+    return answers_dict
+
+def evaluate_json(annotations: Dict[str, Any], predicted_answers: Dict[str, Any]) -> Tuple[float, float]:


Why can't you use the drop script for this function (and the one above) also?

Quoref's json format is the same as that of SQuAD, and DROP's is different. So evaluate_json needs to be different, and evaluate_prediction_file calls that function, so that needed to be rewritten as well. Mentioned this in the docstring.

allennlp/tests/training/metrics/quoref_em_and_f1_test.py

matt-gardner

LGTM.

matt-gardner · 2019-08-21T04:24:32Z

allennlp/training/metrics/__init__.py

@@ -26,3 +26,4 @@
 from allennlp.training.metrics.srl_eval_scorer import SrlEvalScorer, DEFAULT_SRL_EVAL_PATH
 from allennlp.training.metrics.unigram_recall import UnigramRecall
 from allennlp.training.metrics.auc import Auc
+from allennlp.training.metrics.quoref_em_and_f1 import QuorefEmAndF1


Remove this line.

* quoref metric and evaluator * added tests and sample data files * missing prediction * take predictions in simple format too * added a test and fixed docs * test running as a script * removed metric file for quore and added more comments * removed old import

pdasigi requested a review from matt-gardner August 13, 2019 21:07

pdasigi force-pushed the quoref_metric branch 2 times, most recently from 4e6236f to 0e8dde0 Compare August 19, 2019 20:41

pdasigi requested a review from nelson-liu August 19, 2019 22:45

matt-gardner reviewed Aug 20, 2019

View reviewed changes

pdasigi added 7 commits August 20, 2019 16:25

quoref metric and evaluator

18c2f7b

added tests and sample data files

e1c560b

missing prediction

f0d992d

take predictions in simple format too

5240991

added a test and fixed docs

f3b841d

test running as a script

ca20d49

removed metric file for quore and added more comments

50ab44f

pdasigi force-pushed the quoref_metric branch from bab2893 to 50ab44f Compare August 20, 2019 23:25

matt-gardner approved these changes Aug 21, 2019

View reviewed changes

removed old import

445f226

pdasigi merged commit 9d8d36a into allenai:master Aug 21, 2019

pdasigi deleted the quoref_metric branch August 21, 2019 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quoref metric and evaluator #3153

quoref metric and evaluator #3153

pdasigi commented Aug 13, 2019

pdasigi commented Aug 19, 2019

matt-gardner Aug 20, 2019

pdasigi Aug 20, 2019

matt-gardner left a comment

matt-gardner Aug 21, 2019

quoref metric and evaluator #3153

quoref metric and evaluator #3153

Conversation

pdasigi commented Aug 13, 2019

pdasigi commented Aug 19, 2019

matt-gardner Aug 20, 2019

Choose a reason for hiding this comment

pdasigi Aug 20, 2019

Choose a reason for hiding this comment

matt-gardner left a comment

Choose a reason for hiding this comment

matt-gardner Aug 21, 2019

Choose a reason for hiding this comment