-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Conversation
4e6236f
to
0e8dde0
Compare
@matt-gardner this is ready to be reviewed. Should be pretty straightforward as the quoref metric depends heavily on the drop metric, with just the data formats being different. I think this is the last needed piece for the leaderboard. |
answers_dict[query_id] = candidate_answers | ||
return answers_dict | ||
|
||
def evaluate_json(annotations: Dict[str, Any], predicted_answers: Dict[str, Any]) -> Tuple[float, float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you use the drop script for this function (and the one above) also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoref's json format is the same as that of SQuAD, and DROP's is different. So evaluate_json
needs to be different, and evaluate_prediction_file
calls that function, so that needed to be rewritten as well. Mentioned this in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -26,3 +26,4 @@ | |||
from allennlp.training.metrics.srl_eval_scorer import SrlEvalScorer, DEFAULT_SRL_EVAL_PATH | |||
from allennlp.training.metrics.unigram_recall import UnigramRecall | |||
from allennlp.training.metrics.auc import Auc | |||
from allennlp.training.metrics.quoref_em_and_f1 import QuorefEmAndF1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line.
* quoref metric and evaluator * added tests and sample data files * missing prediction * take predictions in simple format too * added a test and fixed docs * test running as a script * removed metric file for quore and added more comments * removed old import
No description provided.