Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

quoref metric and evaluator #3153

Merged
merged 8 commits into from
Aug 21, 2019
Merged

quoref metric and evaluator #3153

merged 8 commits into from
Aug 21, 2019

Conversation

pdasigi
Copy link
Member

@pdasigi pdasigi commented Aug 13, 2019

No description provided.

@pdasigi pdasigi requested a review from matt-gardner August 13, 2019 21:07
@pdasigi pdasigi force-pushed the quoref_metric branch 2 times, most recently from 4e6236f to 0e8dde0 Compare August 19, 2019 20:41
@pdasigi
Copy link
Member Author

pdasigi commented Aug 19, 2019

@matt-gardner this is ready to be reviewed. Should be pretty straightforward as the quoref metric depends heavily on the drop metric, with just the data formats being different. I think this is the last needed piece for the leaderboard.

@pdasigi pdasigi requested a review from nelson-liu August 19, 2019 22:45
allennlp/training/metrics/quoref_em_and_f1.py Outdated Show resolved Hide resolved
answers_dict[query_id] = candidate_answers
return answers_dict

def evaluate_json(annotations: Dict[str, Any], predicted_answers: Dict[str, Any]) -> Tuple[float, float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you use the drop script for this function (and the one above) also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoref's json format is the same as that of SQuAD, and DROP's is different. So evaluate_json needs to be different, and evaluate_prediction_file calls that function, so that needed to be rewritten as well. Mentioned this in the docstring.

allennlp/tests/training/metrics/quoref_em_and_f1_test.py Outdated Show resolved Hide resolved
Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -26,3 +26,4 @@
from allennlp.training.metrics.srl_eval_scorer import SrlEvalScorer, DEFAULT_SRL_EVAL_PATH
from allennlp.training.metrics.unigram_recall import UnigramRecall
from allennlp.training.metrics.auc import Auc
from allennlp.training.metrics.quoref_em_and_f1 import QuorefEmAndF1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line.

@pdasigi pdasigi merged commit 9d8d36a into allenai:master Aug 21, 2019
@pdasigi pdasigi deleted the quoref_metric branch August 21, 2019 16:52
reiyw pushed a commit to reiyw/allennlp that referenced this pull request Nov 12, 2019
* quoref metric and evaluator

* added tests and sample data files

* missing prediction

* take predictions in simple format too

* added a test and fixed docs

* test running as a script

* removed metric file for quore and added more comments

* removed old import
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants