-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further update task domains & categories; Add Utility Scripts #436
Changes from 8 commits
698925b
6f7487a
5c5c55c
098e26c
79c63e1
1c1bdf3
855c0f1
80c4765
851302c
2cd51cd
13f9967
a5b2da5
b8920a0
3cb5ab5
33fac4b
7b295aa
38da451
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
import json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since this is an ad hoc function (used only for this annotation), I would prefer if we don't check this in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added it because it seems that it could be useful for other people working on #4 who would likely need it. |
||
import os | ||
from os import listdir, path | ||
from os.path import isfile, join | ||
|
||
""" | ||
Script for adding the "Domains" field for all the tasks of a given dataset. | ||
""" | ||
|
||
|
||
with open("doc/task-hierarchy.md", 'r') as readmef: | ||
hierarchy_content = " ".join(readmef.readlines()) | ||
|
||
tasks_path = 'tasks/' | ||
# modify these variables as needed | ||
dataset_name = "mctaco" | ||
domains = [ | ||
"News", | ||
"Wikipedia", | ||
"Law", | ||
"Justice", | ||
"History", | ||
"History -> 9/11 Reports", | ||
"Anthropology", | ||
"School Science Textbooks", | ||
"Fiction" | ||
] | ||
|
||
|
||
|
||
def add_domain(tasks_path, dataset_name, domains): | ||
for d in domains: | ||
assert d in hierarchy_content, f"domain {d} not in task-hierarchy" | ||
# find the task files containing the dataset | ||
files = [join(tasks_path, f) for f in listdir(tasks_path) if isfile(join(tasks_path, f)) and dataset_name in f] | ||
files.sort() | ||
# add the domain | ||
for file in files: | ||
with open(file, 'r') as f: | ||
data = json.load(f) | ||
data['Domains'] = domains | ||
os.remove(file) | ||
with open(file, 'w') as f: | ||
modified_json = json.dumps(data, indent=4, ensure_ascii=False) | ||
print(modified_json, file=f) | ||
|
||
if __name__ == "__main__": | ||
add_domain(tasks_path, dataset_name, domains) |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since "extractive" and "abstractive" are only defined in the context of "contextual qa", it might make sense to make them nested:
Answer Generation -> Contextual Question Answering
Answer Generation -> Contextual Question Answering -> Extractive
Answer Generation -> Contextual Question Answering -> Abstractive
What do you all think? @ghlai9665 @swarooprm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes sense to me.