Crowdworker evaluation of the tasks [Work in Progress] #276

danyaljj · 2021-09-17T19:50:50Z

Addressing #206

…aluation template.

danyaljj · 2021-09-18T08:36:54Z

Here is how the template looks like now:

When filled in with the content, here is how it would look like:

danyaljj · 2021-09-18T08:51:10Z

Here is a pilot on task 156. The reviews for the most part indicate positive feedback about the task.

https://docs.google.com/spreadsheets/d/1wXStPurP6AamxvglOw0aOJ7V1DiSKczJNyFcyfDnk4w/edit?usp=sharing

Two changes are needed for the next experiment:

increase no. of positive examples (3 -> 5)
emphasize that they don't need to write an "explanation"; only the "output"

eshaanpathak · 2021-09-19T18:47:39Z

@danyaljj Solid work! This is better than what I could have done for sure. Some advice after a quick glance:

For all of the questions, instead of "How did you assess the quality...?", I would change them to "What did you think about the quality...?".
I'm not sure if there is a better way for them to answer the questions besides saying "pretty good", "weak", "bad", or "confusing". I feel like there is a better way to express this, but I'm not sure. Maybe ask them to rate it on a scale of 1 to 5 or 1 to 10? I will follow up if I am able to think of anything else.
Can you also explain what positive examples are too? The layperson may initially get confused and think that these are examples with positive sentiment as opposed to what they actually are (examples that are correct) and may not realize this until later into the assessment or even at all.
For the layperson, it may also be best to elaborate a bit more on negative examples being "undesirable outputs".
This recent paper may also help with crowdsourcing and how we report our AMT results in the paper: https://arxiv.org/abs/2109.06835

danyaljj · 2021-09-19T22:18:11Z

@eshaanpathak these are good suggestions and I will incorporate them for the forthcoming experiments!

Maybe ask them to rate it on a scale of 1 to 5 or 1 to 10?

Quantitative labels (like "1 to 5") can be interpreted differently and are more subjective, I think (do I interpret 2/5 the same as you?) In my experience, categorical labels (with relatively clear definitions) tend to result in more calibrated outcomes.

eshaanpathak · 2021-09-19T22:35:59Z

Yeah, that's what I was thinking too, but I wasn't sure. I think it was the word choice for me.

Instead of "pretty good", I would instead just put "good" then.

…nsion

danyaljj · 2021-10-13T20:57:44Z

@Palipoor Here is a batch of human judgments collected for tasks60-100 (see the 2nd tab):
https://docs.google.com/spreadsheets/d/1wXStPurP6AamxvglOw0aOJ7V1DiSKczJNyFcyfDnk4w/edit?usp=sharing

I went through the replies and here is what I extracted:

task066_timetravel_binary_consistency_classification.json
- Feedback regarding positive examples: surely people don't panic and screen just because two men yell?
task067_abductivenli_answer_generation.json
- Feedback re. definition: It would be good if, when you say, Use names, to list an actual name in the same way you list pronouns, like say, use names like Barbara, etc.
task069_abductivenli_classification.json:
- is it clear if the outputs should be 1/2? Some workers wrote string responses.
task072_abductivenli_answer_generation.json
- Feedback regarding negative examples: Not sure what was wrong with the "her" in that sentence as the example also included her actual name. "Anna went to Anna's bedroom" would surely be awkward?
task076_splash_correcting_sql_mistake.json:
- Feedback regarding the definition: The instructions are clear, but the variable names are very hard to track. Perhaps that is intentional though.
- Feedback regarding positive examples: Having more positive examples would be better, but overall, they illustrated the point. The one with swapping average for a value of average was a big confusing, since it would confuse people with a function vs. a variable.
- Feedback regarding negative examples: "== 'null' is syntactically incorrect, at least in SQL variants I know. It may be good to clarify even more things are wrong, using null as a string, double equal instead of equal. Stressing NULL is a special value."
- I noticed the word satisfying and satisfy misspelled and English should be capitalized.
task077_splash_explanation_to_sql.json
- You can more explain step by step also you can add more realistic example
task078_splash_sql_to_explanation.json
- In response to the following instance: "SELECT T1.Main_Industry FROM company AS T1 WHERE T1.Company_ID NOT IN ( SELECT T2.Company_ID FROM station_company AS T2 )" someone wrote "i don't have any idea." Anything we can improve here?
- I noticed the words satisfy and satisfying were misspelled in one space. I think giving examples of how all of these things work within the main text would have helped with the clarity.
task079_conala_concat_strings.json
- Maybe a few more examples? These look great, but I feel like once the task starts, there are probably more confusing ones to complete.
- a few more examples would be helpful, such as, what if yummycookies is yummy_cookies?
task081_piqa_wrong_answer_generation.json:
- I think I understand this, but more examples would be nice.
- It is mostly clear but only after I looked at examples then read the instructions again.
- Feedback regarding negative examples: I think one more example of clearly not related at all would be good for anyone questioning it
task084_babi_t1_single_supporting_fact_identify_relevant_fact.json
- Feedback regarding negative examples: Only one example but it is clear why it's wrong
task085_unnatural_addsub_arithmetic.json:
- Feedback regarding negative examples: There wasn't a lot of them, but the one given was explained perfectly.
task086_translated_symbol_arithmetic.json:
- Feedback regarding the instructions: I think the instructions are clear enough.There is a missing word in this sentence that could help with clarity. I have indicated with brackets:" Here, 'sottrazione' represents [the] subtraction operation. So, the answer is 3 (10-7=3)."
- Feedback regarding positive examples: I think these are clear, but for higher clarity, maybe replace input and output with question/answer? It really depends on the context of what's being tested, but the latter is more recognizable for a larger range of people I think.
- Feedback regarding negative examples: I think for the negative example an explanation of how someone might have arrived at the wrong answer would be helpful. This allows people to see common mistakes to watch out for.
task087_new_operator_addsub_arithmetic.json
- Feedback regarding negative examples: Could be more examples
- Feedback regarding negative examples: The one given didn't have the minus sign replaced. (Daniel: this was the example: "15 @ 20 - 3")
task088_identify_typo_verification.json:
- Feedback: Many of the examples are blank. Examples help me to make sure I've understood the instructions properly, so I would want more.
task089_swap_words_verification.json:
- Feedback regarding positive examples: The following sentence is used as an example: "Seattle has nice a weather" But the proposed solution (swapping "a" and "nice") is grammatically incorrect and the output is incorrect as well (it gives it as 2,3).Also, examples 2 through 4 (positive) are blank.
- Feedback regarding negative examples: One or two more negative examples would be a better reference.
task090_equation_learner_algebra.json:
- Feedback: The word "weight" was used in a non-standard (to a non math person at least) without any text explanation as to why that word was used. I figured out how the process worked from the examples, but the unfamiliar terminology made the text part of the explanation unhelpful. More words orverall would have helped, and a step by step explanation of the example would have helped as well (e.g. "the rightmost value is a constant and unaffected by the variable, the next value is multiplied by the variable, etc.)
- Feedback for positive examples: A couple of more examples would have been useful, specifically of the ones with 3 weight values.
- Feedback for negative examples: It was pretty good as is, but another example wouldn't hurt.
task094_conala_calculate_mean.json:
- Feedback: "There are 2 examples of the correct way to do it, but there are still 3 blank areas left, leading to confusion and impreciseness."
- Emphasize that we want only the final answer? Someone output "271.633/ 5 = 54.327" instead of "54.327"
task095_conala_max_absolute_value.json
- Feedback: "I think it would be better to explain what an absolute value is/how you get it, with examples, before explaining the task. Otherwise just reading the task explanation itself is a little difficult."
task096_conala_list_index_subtraction.json
- feedback: "You might want to consider clarifying what an index is for people who may not be that familiar with the term."
- some people used parenthesis instead of brackets: "{11, 11, 3, 7, -10, -1, -27, -23, -15}" instead of "[11, 11, 3, 7, -10, -1, -27, -23, -15]"
task097_conala_remove_duplicates.json:
- is it clear that there should be brackets in the output. Someone wrote "6,0,3,1,7" instead of "[6, 0, 3, 1, 7]"
task098_conala_list_intersection.json:
- feedback re. description: "I understand it but some may not get it until looking at examples."
- Is it clear that (1) the output should have brackets? (2) how the output should be ordered? Some people wrote "9,5,2,4" instead of "[2, 4, 5, 9]"

Palipoor · 2021-10-14T03:26:22Z

How long should I wait for task creators to respond before doing these?

danyaljj · 2021-10-14T17:33:45Z

@Palipoor I think we should go ahead and address the ones that we can.

danyaljj · 2021-10-14T18:39:06Z

Here is feedback on task1-60. @swarooprm these are all yous! :)

task003_mctaco_question_generation_event_duration.json
- Definition: I think that the instructions could provide more information about the type of events that are being discussed (in addition to talking about brushing teeth).
- positive examples: I think that the inputs and outputs were good, but it would be useful to see more of them. It may also be useful to see outputs that use different wording than "how long."
task006_mctaco_question_generation_transient_stationary.json
- Definition: These are well written but I definitely needed the examples to be sure of what you are looking for
- Positive examples: Maybe have a few more complicated examples.
task006_mctaco_question_generation_transient_stationary.json
- I had a hard time understanding the "stationary v. transient" vocab.
task007_mctaco_answer_generation_transient_stationary.json:
- It might help to have one more negative example
task010_mctaco_answer_generation_event_ordering.json
- Defintion: I think maybe it could be worded simpler such as, What happens next?
task011_mctaco_wrong_answer_generation_event_ordering.json
- You should give more example.
task012_mctaco_question_generation_absolute_timepoint.json
- They are good, but I would like clarity on whether the questions can require specialized knowledge. For example, referring to the NBA basketball game, and knowing what time they usually play NBA games at.
- I didn't know how something could be an absolute timepoint if you were just making time estimates.
task013_mctaco_answer_generation_absolute_timepoint.json
- It would be helpful to have even more examples so I could have a better understanding of the task. Examples 2 to 4 are blank.
task014_mctaco_wrong_answer_generation_absolute_timepoint.json
- Not enough examples
task018_mctaco_temporal_reasoning_presence.json
- Negative examples: Pretty sure No is the correct answer there? (Daniel: the example was this: "Sentence: Jack played basketball after school, after which he was very tired.
  Question: Who played basketball after school?")
task024_cosmosqa_answer_generation.json
- Definition: It's confusing whether you want a response to the speaker or you want a summary from the speaker.
- Examples: Example natural instructions-v1 #1 made no sense to me and seemed to have grammar errors at the end. (Daniel: here is the example: "Context: you see , at my age relationship is kind of important and i thought i got the one after all these years . I noticed that once again i was wrong . i was good simply because i was good , i was caring , helping , supportive , bla bla blaaa .
  Question: What may happen to me?")
task029_winogrande_full_object.json;
- Definition: I was confused by the massive list of contrastive words.
- P examples: It was tough to parse out the differences in the examples.
- N examples: It was hard to see what made these different from the positive examples.
task030_winogrande_full_person.json
- Def: The instructions felt very wordy.
- N examples: Some of these examples I thought could be positive ones.
task031_winogrande_question_generation_object.json
- I didn't understand the contrapositives.
- Examples felt totally random based on the inputs.
- I didn't see how these differed from the positive examples.
task034_winogrande_question_modification_object.json
- Definition: I had no idea what you meant by "contrastive" words.
- N examples: i don't have any negative
- N examples: It's confusing there needs to be at least 70% overlapping words when we don't have a word counter.
- P examples: I was confused which words were being flipped.
task035_winogrande_question_modification_person.json:
- Def: The formatting was hard to read; numbered lists typically have line breaks between list items.There were also grammatical errors ("should be an well-agreed answer", "atleast").The clarity could have been improved in some areas. For instance, "void repeating the same style or phrase in generating your modified question e.g. this task can be always solved using a simple negation i.e. by adding not, never, etc. Instead, try to increase the word diversity." is a confusing sentence, particularly because of the use of both "i.e." and "e.g."Also, the using real names examples are kind of unnecessary ("Donald Trump or Putin").
- P examples: The grammar is weird at times ("replacing the trigger word 'sympathetic' by 'stern' "). I would think you might use the word "with" instead of "by."
- N examples: The first example didn't really make sense because of (presumably) a typo: "PersonY was always ahead of PersonY, as _ walked with a quick step ."
task036_qasc_topic_word_to_generate_related_fact.json
- Def: Virations sound should be "vibrations."
- P examples: I wasn't sure how you were coming up with the topic words that you were.
- N examples: Again, I didn't know how you were coming up with the topic words you came up with.
task044_essential_terms_identifying_essential_words.json
- Def: The inclusion or exclusion of nouns in the question can be better explained.
- N examples: An example showing more words that necessary being given would have been helpful.
task045_miscellaneous_sentence_paraphrasing.json
- P examples: if they were not all the same example
task046_miscellaenous_question_typing.json:
- Listed 4 examples, but last 2 examples were actually blank. A few more examples might be nice. Would have liked to see an answer that was "other". Examples listed were pretty obvious, would be nice to see some more complicated ones.
task049_multirc_questions_needed_to_answer.json
- The questions weren't made clear
- The instructions were understandable, but the example questions weren't clear.
- The same as above. The example reasoning made sense even without the questions.
task053_multirc_correct_bad_question.json
- Instructions were a bit circular.
task053_multirc_correct_bad_question.json
- Do all questions need to be corrected or are some correct the way they are written?
task057_multirc_classify_incorrect_answer.json
- Well, when it said, "an incorrect answer should be of the same semantic type of the given correct answer (e.g., both can be names of locations)", I thought it meant that, for example, "not very hot" would not be a good incorrect answer for a question like, "What was the temperature outside?" because a correct answer would be something like, "well over 100 F" and so I thought that a good incorrect answer should also mention a specific temperature, but the examples cleared that up for me, and actually "not very hot" would be a good incorrect answer, so it's a good thing the examples were there or I would have been confused.
- It is a tad odd that the examples are numbered starting from "0", and I actually read three examples, went back to check something in the task definition, and then tried to return to the third question and got confused because that wasn't the question I was on, and obviously it turned out that the numbering started at "0", so I should have gone to the one numbered "2".
- These are numbered starting at "0" too, though in this case I personally was not confused. Also, I believe that if the answer I'm evaluating is correct, that does not count as a good incorrect answer, and it would have been nice if there was just a simple example showing an output of "Yes" when the input showed an actually correct answer, unless of course I'm missing something and I actually am supposed to say, "Yes" in that situation.

Palipoor · 2021-10-19T00:45:23Z

Here is feedback on task1-60. @swarooprm these are all yous! :)

Does this mean I shouldn't pick tasks 1-60 to fix?

swarooprm · 2021-10-19T00:50:00Z

Here is feedback on task1-60. @swarooprm these are all yous! :)

Does this mean I shouldn't pick tasks 1-60 to fix?

@Palipoor Feel free to fix those. I am focusing on other priorities of this project.

danyaljj · 2021-10-19T00:50:34Z

Here is feedback on task1-60. @swarooprm these are all yous! :)

Does this mean I shouldn't pick tasks 1-60 to fix?

It looks like @swarooprm 's gonna focus on merging the open PRs and also re-assigning hierarchy tags, discussed in #458.
So, if you have the bandwidth to help address tasks 1-60, that would be super helpful! 🙏

danyaljj · 2021-10-19T18:47:36Z

Tasks 200-300:

danyaljj · 2021-10-20T21:32:12Z

Tasks 300-399:

task301_record_question_generation.json
- regarding p examples: Far too much reading required in the examples
- regarding p examples: all examples have way too much reading involved
task303_record_incorrect_answer_generation.json
- regarding p examples: The write up was a bit too complex in its writing.
- regarding p examples: Overly long examples for both positive and negative
- Daniel: surprisingly, most people answer the instances incorrectly (probably not noticing "incorrect"). Wondering if we can bring attention to "incorrect" by changing it to "incorrect".
task305_jeopardy_answer_generation_normal.json
- regarding instructions: Changing "all answers should be lowercased" to "answers should be in all lower cased letters" could improve clarity.
- regarding p examples: The examples were good and easily understood but examples 3 and 4 were blank. Just a personal opinion but perhaps examples that use abbreviations (e.g. "Dec. of Indep.") should be avoided as they could be ambiguous to participants. In terms of presentation, it would be useful to start off with "Example 1" and not "Example 0." Additionally, setting "Category:" and "Clue:" as subheadings could also help with readability.
- regarding n examples: As mentioned previously in the positive examples field, starting with "Example 0" is a bit odd.
task308_jeopardy_answer_generation_all.json
- regarding instructions: Instructions could be a bit more thorough.
- regarding p examples: Could use more examples
- regarding n examples: Could use more examples
task309_race_answer_generation.json
- Daniel: somehow our annotators answer the instances incorrectly. Wondering if there is a systemic bug in the instance labels.
task311_race_question_generation.json
- regarding p examples: way too much text for all examples
- regarding instructions: I think it would be easier to just ask for a sentence that basically sums up what the article was about.
- regarding p examples: I got a pretty good understanding of what kind of information you are looking for.
- regarding n examples: I think the negative examples were good, but they all just had one small problem with them like saying British instead of American, I wonder if there would be an example that didn't just get a small fact wrong, but an example of a bad summary for the article.
task316_crows-pairs_classification_stereotype.json
- regarding instructions: The instructions make it clear that the acceptable output is either stereotype or anti-stereotype yet the questions should sometimes not be answered without any explanation. For example, the statement "Anna's mother was proud of her for behaving in such a strong masculine way"; I would lable it stereotypical if I was talking about a strong way being considered masculine, but I might lable it anti-stereotypical if I was talking about Anna's mother being proud of her daughter for acting manly because I don't think mothers want to see their daughters acting manly (even though they want them to be strong). I don't know how you would know if I thought the group being stereotyped was mothers or men (without an explantion). I am probably over thinking the task but I wanted to provide some feedback.<newline>
- regarding instructions: I think it could use some examples of common stereotypes or what stereotypes usually are making statements on. For example: race, ethnicity, gender, ability, etc.
- regarding p examples: The examples are clear, but only rely on one form of stereotyping (race or ethnicity).
task319_stereoset_classification_profession.json
- regarding n examples: Definitely needs a few more examples
task320_stereoset_classification_race.json
- regarding n examples: Example 1 is listed as a negative example, yet the provided answer seems to agree with the answer given in the explanation (anti-stereotype).
task322_jigsaw_classification_threat.json
- regarding instructions: I think anyone can tell what to do here given the examples too and the explanation for it all. I didn't have any issues with it.
- regarding p examples: I can tell how and why the options were chosen as they were, it is a no brainer.
- regarding n examples: Easily understood by me without much reading.
task324_jigsaw_classification_disagree.json
- regarding instructions: wording on what agree means could be improved. (rather than lack of disagreement, should explain you're seeking agreement. maybe add a neither option if it appears neutral?)
- regarding n examples: explanations on examples make no sense....?
- regarding instructions: The comments are lacking context, which makes this a bit confusing, but still understandable. Negative examples don't entirely make sense to me.
- regarding p examples: Just lacking some context, as I mentioned above.
task326_jigsaw_classification_obscene.json
- regarding instructions: not sure if obscenity refers just to sex, or if negative language too
- regarding p examples: Example 1 lists a comment about the weather as obscene
- regarding instructions: " materials and acts of pornography" not sure what this means by materials.
- regarding p examples: One of the examples says damn in it, and it says to label it obscene because it refers to sexual content. it doesn't though.
- regarding p examples: i want to improved that quality
- regarding n examples: i don't have any negative example
task328_jigsaw_classification_insult.json
- regarding instructions: Instead of listing examples one by one, it's better to list first the examples of either insult or non-insult followed by the other.
- regarding p examples: Some of the output and definitions do not match. For example, the example says "non-insult" but in the explain, it says it is an "insult".
- regarding n examples: There was only one negative insult that I could see. There could be more.
task329_gap_classification.json
- regarding n examples: The word "refers" was written as "efers" twice.
task330_gap_answer_generation.json
- regarding n examples: typos in the instructions (efers instead of refers)
task331_gap_incorrect_answer_generation.json
- regarding instructions: Doesn't really explain what an ambiguous pronoun is
task332_tellmewhy_answer_generation.json
- regarding instructions: Could use a better definition of what exactly is considered a "complete: answer
- regarding p examples: explanations need to be expanded upon
- regarding n examples: could use more examples
task333_hateeval_classification_hate_en.json
- regarding instructions: It might be helpful to have some sort of content warning in the instructions for the task. Also to ask about demographics since that might influence what people identify as hateful or not. Also, since there are only two options a radio button or dropdown would be clearer and indicate that only those two outputs should be used.
- regarding p examples: They are clear, but more examples could be given.
- regarding instructions: Change "sex" to "gender"<newline><newline>Make it clear that we are supposed to type "Non-hateful" or "Hateful"<newline><newline>Should we click on links? That can completely change the answer (like the Trump tweet on this page) and also ups the work time. It isn't mentioned anywhere.
- regarding p examples: Include an example with a link that changes the context.
- regarding n examples: The negatives are showing wrong word choices not just bad examples. The tweet below this about Trump would be a good type of example to use because if I click the link it is definitely hateful, but if I don't click the link, it isn't
task334_hateeval_classification_hate_es.json
- regarding instructions: Doesn't state how we should translate the spanish posts
- regarding instructions: doesn't mention whether we're supposed to infer if the spanish is offensive or not or use a translation service
- regarding p examples: both positive and negative examples provide translations but the actual task does not
- regarding instructions: Feel like hatefulness could be a wide spectrum that depends on each individual's personal attitudes.
- Daniel: if this task involves Spanish, is this mentioned in the input_language tags?
task335_hateeval_classification_aggresive_en.json
- regarding instructions: Bit of a grey area separating hate, aggression, and abuse
- regarding instructions: aggressiveness and abusiveness seem kind of like grey areas
- ```
- Daniel: if this task involves Spanish, is this mentioned in the `input_language` tags? 
```
task336_hateeval_classification_aggresive_es.json
- regarding instructions: Doesn't state how/if we should translate the spanish
- regarding p examples: pos and neg examples translate the spanish for me but the actual task does not
- regarding instructions: This question says goodbye and heard menu<newline>
- regarding p examples: The liquid word is one of the most important for everyone the person.
- regarding n examples: Heard speech should not be spoken to in public or to anyone.
- Daniel: if this task involves Spanish, is this mentioned in the input_language tags?
task338_hateeval_classification_individual_es.json
- regarding instructions: I honestly don't understand any of it given the examples.
- regarding p examples: I don't understand why one is positive and the other is negative when they're both hateful so I can't even say if it's good or not
- regarding n examples: I don't understand why one is positive and the other is negative when they're both hateful so I can't even say if it's good or not
- regarding instructions: Straightforward but doesn't mention if/how we should translate spanish to english
task339_record_answer_generation.json
- regarding p examples: All examples are giant walls of text
task340_winomt_classification_gender_pro.json
- regarding p examples: could use more examples (altho not many needed for something so straightforward)
- regarding n examples: could use more examples (altho not many needed for something so straightforward)
task341_winomt_classification_gender_anti.json
- regarding instructions: "The coreference link with gendered pronouns", sounds like you have a chart or something. Please, either define what coreference link is clearly or just omit it, it's easy enough to figure out from the examples. You're wasting time of those taking the survey because they will be looking for a "chart". Mention clearly that you only really want Male/female though.
- regarding p examples: Examples make clear what's ok.
- regarding n examples: You say woman is wrong but, you hadn't really clearly defined this before. it's good you're doing it here, but what about "Man", Trans, etc?
task343_winomt_classification_profession_anti.json
- regarding instructions: No idea what a coreference link is
task344_hybridqa_answer_generation.json
- regarding p examples: Please be more clarify like if there is a positive correlation between question and answer.
- regarding n examples: Again more clarify to be the negative correlation between question and answer
- regarding instructions: Doesn't say what we should do if we have no knowledge of the question being asked
task345_hybridqa_answer_generation.json
- regarding instructions: Gigantic list of speech tags that no one would be able to memorize
- regarding instructions: Gigantic wall of text. Over 20 different tags I'm supposed to memorize, and I have to tag every single word
- regarding p examples: ``
- regarding n examples: ``
task346_hybridqa_classification.json
- regarding instructions: Gigantic wall of tags I'm supposed to memorize.
- regarding instructions: The instruction may be quite easier
- regarding p examples: Lack of clarities
- regarding n examples: It may be clear more
task347_hybridqa_incorrect_answer_generation.json
- regarding instructions: I think it would be better to put the tags in a listing going from the top to the bottom. Having them bunched together in paragraph format makes it a bit hard to read.
- regarding p examples: Examples 2, 3, and 4 are completely blank. They should either be filled in or removed.
- regarding n examples: I think giving more examples would make the instructions more clear.
- regarding instructions: giant wall of tags to remember, but since we have to give an implausible one, maybe it's easier.
task348_squad2.0_unanswerable_question_generation.json
- regarding instructions: very intresting one.
- regarding p examples: it also a positive one.
- regarding p examples: for both pro and con, the examples are way too long to read
task350_winomt_classification_gender_identifiability_pro.json
- regarding p examples: Why is there always some missing?
task351_winomt_classification_gender_identifiability_anti.json
- regarding p examples: could use more examples
- regarding n examples: could use more examples
- regarding instructions: The examples were simple and I could understand the good examples but the negative examples confused me.
- regarding p examples: These were simple and fairly easy to understand.
- regarding n examples: These were a little confusing.
task352_coda-19_classification.json
- regarding instructions: Doesn't explain what each category means
- regarding instructions: Too many categories for paragraphs that are too long. Doesn't explain what each category is.
task354_casino_classification_negotiation_no_need.json
- regarding instructions: There were not many good examples and non-examples.
- regarding p examples: I really needed more examples.
- regarding n examples: I needed more examples.
task357_casino_classification_negotiation_small_talk.json
- regarding instructions: it's too wordy. don't need to know what they're negotiating about.
task359_casino_classification_negotiation_vouch_fair.json
- regarding instructions: pretty vague explanation for what vouch-fair is.
task360_spolin_yesand_response_generation.json
- regarding instructions: The examples and negative examples were good.
- regarding p examples: The explanations were good and clear.
- regarding n examples: They helped me understand what would be a bad answer.
- regarding instructions: If these HITs keep running I would really like the instructions to show up again after the feedback boxes so I don't have to scroll all the way back up
task362_spolin_yesand_prompt_response_sub_classification.json
- regarding instructions: it's a bit too wordy but it's mostly clear
task365_synthetic_remove_vowels.json
- regarding instructions: Examples 3 & 4 are missing from the good examples. I would also bold the actual instructions of removing the vowels, not everybody reads the directions clearly.
- regarding p examples: They were fine, just reiterating that they were missing 3 & 4.
task367_synthetic_remove_floats.json
- regarding instructions: Because I am dumb, I had to google to remember that an integer just means a whole number. Should probably include that integers be separated by a comma
- regarding instructions: I would suggest defining what an integer is for those that don't recall.
- regarding p examples: Examples 2, 3, and 4 are blank. Examples 0 and 1 are fine.
- regarding n examples: The negative examples are very good. I like that they covered all sorts of different types of incorrect output.
task369_synthetic_remove_odds.json
- regarding instructions: It made it very clear that odd numbers are not included in the output. I do think it could clarify what to do with the number 0 though.
- regarding p examples: It clearly showed examples of only even numbers being included in the output and the odd numbers remaining in the input. I liked how it clarified that 0 would be included with even numbers. Examples 2,3, and 4 could have been filled in because it was a bit awkward that they were blank. They could include examples with negative numbers.
- regarding n examples: I think a couple more negative examples could have been shown just to let people know for sure what isn't right.
task371_synthetic_product_of_list.json
- regarding instructions: Suggest changing positive and negative examples to correct and incorrect examples respectively to make them more clear.
- regarding n examples: the explanation on negative example 2 is unclear because the output does not look like a list of lists even though they are in brackets.
- regarding instructions: I don't think there should be a comma after "of lists" in this sentence: In this task you will be given a list, of lists, of integers.<newline><newline>It makes it confusing and I had to reread it a few times.
- regarding p examples: They seemed okay but there were not enough of the examples.
- regarding n examples: There are enough examples. I am not really grasping the tasks though.
task373_synthetic_round_tens_place.json
- regarding p examples: In the sentence "The output correctly round each integer in the input list to the nearest ten.", "round" should be "rounds".
task374_synthetic_pos_or_neg_calculation.json
- regarding p examples: In the sentence "After the rule is applied 6 and -4 are equal, because that are multiplied by different numbers.", "that" should be "they".
task375_classify_type_of_sentence_in_debate.json
- regarding instructions: this should be more informative
- regarding p examples: the example should be more easy to understand
- regarding n examples: there is no examples in this instructions
task376_reverse_order_of_words.json
- regarding instructions: Explanation of the answer can be shortened such that the last sentence can read : So, the output is the correct answer.
- regarding p examples: Article is missing such that "A" should be added before "luggage" in example 0.
- regarding n examples: Cut off the "instead of...." part in the answer to shorten the answer.
- regarding p examples: In the sentence "The order of words in the given sentence is 'lugguage', 'surrounds', 'a', 'vehicle', 'in', 'an', 'underground', 'parking', 'area'.", "lugguage" should be "luggage".
- regarding instructions: doesn't say if we should include punctuation
task377_remove_words_of_given_length.json
- regarding instructions: Noting that we could be critical is VERY helpful. I don't want to always nitpick on typos or phrasing because we're often asked to ignore that.
- regarding p examples: Just add a couple more examples.
- regarding n examples: It seems straightforward and the 2 examples are good, but once I do a few HIT's I may notice something different.
task378_reverse_words_of_given_length.json
- regarding instructions: It was easy to understand especially with the examples.
- regarding p examples: Simpler sentence examples.
- regarding n examples: Simpler sentence examples.
task379_agnews_topic_classification.json
- regarding instructions: Doesn't mention what we should enter if the input doesn't match any of the 4 categories
- regarding p examples: Should probably contain an example of each category
task381_boolq_question_generation.json
- regarding n examples: In the sentence "There's no information in the passage to help answer this question, this is not a valid output.", maybe there should be a "therefore" before "this is not a valid output".
- regarding p examples: could use more examples
task383_matres_classification.json
- regarding instructions: The word "anchored" didn't click with me. The phrasing on the instructions sounds much more confusing than the task. Also, fix the line break issue after "verb separated with a"<newline><newline>Most importantly, do I just type "yes" or "no"?<newline><newline>Switching from one HIT that was "remove words of 7 letters" to this task is very confusing and overwhelming. They should be different batches.<newline><newline>The time it takes to read the instructions and do the task is much longer when they are intermixed like this and end up being underpaid or make me want to return. I'm already over 13 minutes on this one!<newline>
- regarding p examples: I definitely need more examples. Why isn't "invite" anchorable? It is a verb and it happened in the past because they decided to invite at last years (in the past) convention. Or is it only anchorable if it was "invited". This is really confusing!
- regarding n examples: I liked that the first example was longer and the second shorter. But, I think we need a few more long examples where the word doesn't come at the end, which makes me want to skim.
task384_socialiqa_question_classification.json
- regarding p examples: In the sentence "Because Through context, we show that Sydney Sydney walks past homeless women and the answer is 'Sympathetic' which is an emotion.", Through shouldn't be capitalized.
- regarding n examples: Example 3 has a confusing 'answer' - is the incorrect answer all three?
task386_semeval_2018_task3_irony_detection.json
- regarding instructions: These instructions were better than some others. There was bold font and I knew I had to label as 'IRONIC' or 'NOT' and not "NOT IRONIC" - also, do they need to be in all caps?<newline><newline>But, so ambiguous! I'd hate to be judged for my quality of work because I didn't think something was ironic that other people did think was ironic.<newline><newline>Do we have to follow a link to see what the tweet is referring too as is the second test on this hit? I can't tell if they are being ironic unless I follow the link.
- regarding p examples: Example 2 seems Ironic to me! They aren't dying as their lifeblood leaks from a stab wound. They are ironically 'dying' because there are only 3 episodes left.
- regarding n examples: You used the same bad example for a negative as a positive and I still feel like it IS ironic. And this is why i'd greatly worry about the task. It's too ambiguous and
- regarding p examples: could use more examples
- regarding n examples: Negative examples definitely benefit from covering a wide variety of potential screwups. Hard to do that with just 1 example
task389_torque_generate_temporal_question.json
- regarding instructions: More information on what is considered a temporal relation. It seems too similar to the bad examples.
- regarding n examples: I could take the response and still make it temporal so I am confused.
- regarding instructions: Please mention in the instructions that a event must be mentioned in the output as a temporal comparison.
- regarding p examples: The instruction says a temporal relation describes the relation between two things with respect to time, but the output in example 2 does not mention any event.
task390_torque_text_span_selection.json
- regarding instructions: Does not explain what temporal relation is
- regarding instructions: Written in a confusing manner but made clearer by the examples. Sort of.
task391_causal_relationship.json
- regarding p examples: There is space for 5 examples, but the last 3 have no text. Either add actual examples there or remove the blank spaces. There is also no period or punctuation at the end of the Example 1 input sentence.
task392_inverse_causal_relationship.json
- regarding p examples: not good enough examples to understand
- regarding n examples: there is no examples there in some questions
- regarding instructions: 'possible causation' sounds more technical than needed. The phrasing makes me feel like I need to pull out a dictionary.
- regarding n examples: definitely need a couple more examples
task397_semeval_2018_task1_tweet_anger_detection.json
- regarding instructions: Differentiation of positive and negative examples can be made even clear.
- regarding p examples: The instruction quality is good enough to understand and to make a clear response.
- regarding n examples: It should have been given in a better quality.
task399_semeval_2018_task1_tweet_sadness_detection.json
- regarding instructions: Make the instructions bulleted and larger font. "You must judge whether the author of the tweet is sad or not." <newline>"Label the instances as 'Sad' or 'Not sad' based on your judgment."<newline><newline>That's really all I need to see to know that I need to change my focus from verbs to sadness. But really, this is TOUGH when you are going through so many different styles of tasks
- regarding p examples: Knowing that we can go off of a hashtag makes a BIG difference, because so many hashtags are purposely sarcastic and the person isn't sad at all.
- regarding n examples: Need more examples. This is so different from one persons view to the other. And tweets can be so ambiguous.
task400_paws_paraphrase_classification.json
- regarding n examples: Needs more than one example

danyaljj · 2021-10-28T18:23:46Z

Tasks 401-600 human feedback:

task404_grailqa_paraphrase_validation.json
- regarding p examples: The output field is confusing to me. I don't understand what is supposed to go there, so this should be clarified.<newline>Also, some of the positive examples seem to be negative, in that they need improvement. Why is this so? Wouldn't these be negative examples? I'm confused by this too.
- regarding n examples: I still don't understand the purpose of the Output field.
task406_mickey_fr_sentence_perturbation_generation.json
- regarding instructions: Does not even attempt to explain perturbations
- regarding instructions: Does not explain what perturbations are
task407_mickey_hi_sentence_perturbation_generation.json
- regarding instructions: The quality of the instruction is average need more explanation and clarity according to which one can proceed doing with the Hit.
- regarding p examples: Many of the provided positive examples are empty and not so clarity.
- regarding n examples: Input and output sessions are empty how do we proceed further? extremely confusing.
task408_mickey_it_sentence_perturbation_generation.json
- regarding instructions: doesn't properly explain what perturbations are in this context.
task410_mickey_ru_sentence_perturbation_generation.json
- regarding instructions: Does not explain what perturbations are.
- regarding p examples: Both + and - examples are either blank or have random punctuation
- regarding instructions: the is no example shown.
task411_mickey_vi_sentence_perturbation_generation.json
- regarding instructions: Does not explain the concept of perturbations.
- regarding instructions: Does not explain what a perturbation is.
task412_mickey_zh_sentence_perturbation_generation.json
- regarding instructions: It sort of makes sense but then once again it included the word perturbations which makes no sense to me and I can't figure out what the word you're going for is.
- regarding p examples: both + and - examples are missing entirely. I guess your AI can't transcribe foreign script.
- regarding instructions: Doesn't explain what a perturbance is
- regarding p examples: No examples shown
- regarding n examples: Negative examples section completely missing.
task413_mickey_en_sentence_perturbation_generation.json
- regarding instructions: Had to google what perturbations means, but it doesn't fit at all into what the instructions are generally asking.
- regarding p examples: Both positive and negative examples are mostly straightforward but they don't really follow the instructions for whatever perturbations are
- regarding instructions: Again with the perturbations. Impossible to understand the instructions.
task414_mickey_ar_sentence_perturbation_generation.json
- regarding p examples: No examples displayed
- regarding n examples: No examples displayed
- regarding p examples: No examples shown
- regarding n examples: No examples shown
task415_mickey_bg_sentence_perturbation_generation.json
- regarding p examples: There are no inputs or outputs.
- regarding n examples: There are no inputs or outputs.
- regarding instructions: does not explain perturbations clearly enough
- regarding p examples: both + and - examples are either missing or replaced with random punctuation
task416_mickey_de_sentence_perturbation_generation.json
- regarding instructions: Does not explain the concept of perturbating a sentence.
task419_persent_answer_generation.json
- regarding p examples: Example 0 should probably include is full name and not just his surname
task420_persent_document_sentiment_classification.json
- regarding instructions: In the sentence "Given a document and an entity the task is to select the authors sentiment towards the enity.", "authors" should be "author's" and "enity" should be "entity".
task428_senteval_inversion.json
- regarding instructions: Might be tough to judge whether its a genuine inversion or just an unclear sentence
- regarding p examples: Most of them were missing which confused me.
- regarding n examples: There was only one, so having multiple may help but it was overall simple enough to follow.
task429_senteval_tense.json
- regarding n examples: slightly confusing the task.
- regarding p examples: There was no example for #3 and #4.
- regarding n examples: Provide more examples.
task430_senteval_subject_count.json
- regarding n examples: There should be a period at the end of the sentence "'Her bridesmaids and groomsmen' is the subject of the sentence which is plural so the output is incorrect".
- regarding instructions: The instructions are clear and grammatically sound. No improvement is needed.
- regarding p examples: The input sentences have some minor typos in the form of extra spaces:<newline>Example 0: yellow , and <newline>Example 1: summer .<newline>Example 2: to , the
- regarding n examples: The input sentences have some minor typos in the form of extra spaces as well:<newline>Example 0: demon 's / woods , and / me .<newline>Example 1: ease , even / friend , Tamara / Kincaid , who / self-assured .
- regarding p examples: There were a few blank examples, but the task is straightforward enough to not need them.
task431_senteval_object_count.json
- regarding instructions: I feel like you could include more in the instructions to specify the difference between singular and plural
- regarding n examples: The example given is fine, however I think adding an additional one or two negative examples would be pretty beneficial.
- regarding instructions: It should say "the object of the main clause" instead of just "object of the main clause".
task442_com_qa_paraphrase_question_generation.json
- regarding instructions: Usually paraphrase means to give a shorter variation of the sentence. Some of the examples aren't paraphrases but a rewording of the question.
- regarding p examples: From the examples, I can tell that a variation of the question is desired.
- regarding n examples: I can understand them that the output provides a question that would get a different answer.
- regarding p examples: Could have had more positive examples.
task451_opus_paracrawl_tl_en_translation.json
- regarding instructions: The instructions were clear.
- regarding p examples: In example 0, output should read "LED" rather than "led" because LED is an acronym.<newline>In example 1, the explanation should read "its meaning" rather than "it's meaning."<newline>Examples 1 and 2 start the input with "Tagalog sentence:" but example 0 omits this.<newline>Examples 3 and 4 are blank.
- regarding n examples: In Example 2, "tagalog" should be capitalized.
- regarding instructions: Doesn't explain how to translate it.
- regarding n examples: Isn't completely clear.
task452_opus_paracrawl_en_ig_translation.json
- regarding instructions: It could be more explanatory or descriptive.
- regarding p examples: There are some parts that are missing examples and need to be filled.
- regarding p examples: I guess you could give more examples if you have spaces for them. Maybe something that isn't a perfect translation (according to Google Translate) but still acceptable.
- regarding n examples: I think you used a good variety of examples.
task453_swag_answer_generation.json
- regarding instructions: They are not bad, I think you could be a little more detailed about what input/output even is here.
- regarding n examples: These are good I think you could use one or two more examples that define different situations, one that is close to being good but something about it makes it negative things like that.
- regarding p examples: Every example after 1 is left blank. Example 0 is good, but I'd change "upside - down" to "upside-down".
task455_swag_context_generation.json
- regarding instructions: requirement on number of words, and can you use same words in the context or not.
- regarding n examples: negative examples are easy to understand
task456_matres_intention_classification.json
- regarding instructions: You should emphasize that the verb in question is the one in parenthesis- not any other potential intentions within the sentence. Also define "unconditional" or provide an example. The HITs should have a yes or no bubble, and an extra fill in the blank if an explanation is needed or desired.
- regarding p examples: I understood these easily. I'd have liked to see something "conditional" to know what that meant.
- regarding n examples: I didn't understand these and would have answered the opposite of what was provided. I'd need a few more examples to make sense of them.
task457_matres_conditional_classification.json
- regarding instructions: Sentence should be (possibly) fixed from "a hypothetical, or condition or not" to hypothetical, conditional, or none.
task461_qasper_question_generation.json
- regarding instructions: More definite instruction on the conciseness of the answer should be given. Does it need to be concise or not?
- regarding p examples: For example, the answer to example 2 in the positive examples seems to be very long and abstract. Can we write an output that entails a subjective answer?<newline>There is a typo in Explanation of the example in example 1, "grammaticallycorrect" should be "grammatically correct".
task462_qasper_classification.json
- regarding instructions: Directions are quite confusing and terms are used which the average Adult, even college educated, will be hard pressed to parse out.
- regarding p examples: Overall the positive examples make it possible to SORT OF eke out this hit.
- regarding n examples: Negative examples are harder to figure out. Clarity should be more of a goal here.
- regarding instructions: Some terms are undefined, novice linguists won't be able to figure this out easily. "concatenating extracts", is not a term I have ever heard, I have taught High school speech. If you are not using Graduate English students as your target group, you'll be quite disappointed.
- regarding p examples: The examples save us slightly here.
- regarding n examples: The examples continue to help define things slightly here.
task470_mrqa_question_generation.json
- regarding instructions: I think wording could be a little more clear. For example, you could say "generate a question that can be answered from the passage."
task471_haspart_answer_generation.json
- regarding instructions: A few more examples would be nice, and the definitions along the side of the page, not at the top.
- regarding p examples: A few examples that are harder to clarify would help.
- regarding n examples: A more in depth explanation of why.
task472_haspart_classification.json
- regarding instructions: I thought the wording of the description of the task was needlessly complex. People do not need to know the specific terminology of meronym or holonym to understand or perform the task acceptably.
task475_yelp_polarity_classification.json
- regarding instructions: Keep them simple and short but give me examples for people to look over and learn from if they don't understand. Make sure the examples have explanations that go in depth to make sure people can get the concept.
- regarding p examples: Give a range of examples. Give some examples of obvious positive and some borderline cases that might confuse people.
- regarding n examples: Give a range of examples. Make sure to provide examples of cases that could confuse people and explain why they should be negative.
task476_cls_english_books_classification.json
- regarding instructions: You state" " In this task, you are given books product reviews in English language.". This is not a grammatically correct sentence.
task477_cls_english_dvd_classification.json
- regarding instructions: It was pretty good overall, i feel like the instructions could be a bit longer. The formatting is also a bit hard to read fluidly.
- regarding n examples: I'd like more negative examples, there's not too many of them.
task490_mwsc_options_generation.json
- regarding instructions: I would remove the examples from the instructions, since better examples are to follow below.
task493_review_polarity_classification.json
- regarding instructions: I would add a comma where I've indicated:<newline>"Given reviews from Amazon, classify those review based on their content into two classes: Negative or Positive."
- regarding p examples: I think the examples are clear and sufficient. There is a type where I've indicated in caps:<newline>"There ARE positive words in the review like 'love it', 'very touch', therefore it is classified as positive review."<newline>Also, "very touch" doesn't suggest a positive aspect to me.
- regarding n examples: I think the examples are clear and sufficient. Typos:<newline><newline>This is a negative review. because THE print quality of book is very bad and not readable.<newline><newline>No period needed in this sentence:<newline> This is a positive review. because it says 'I love it' and 'It's wonderful'.
- regarding instructions: Should spell out that positive and negative refer to the tone and opinion, not the quality.
task494_review_polarity_answer_generation.json
- regarding instructions: Instructions are brief and clear, if, by chance a worker doesn't know what you mean by polarity they can easily look at the examples and they'll quickly understand it.
- regarding p examples: Examples are clear and make sense here.
- regarding n examples: Examples are ok, Example one is a tiny bit murky, but overall they are quite good.
task500_scruples_anecdotes_title_generation.json
- regarding instructions: Would only like to know what constitutes a good title.
- regarding instructions: Easy enough to understand but I'd want more instructions for the title. Should it be less than 5 words? A complete sentence? etc.
task501_scruples_anecdotes_post_type_verification.json
- regarding instructions: My only question is about the associated claim. Are we only dealing with hypothetical and historical posts or are there more types of claims?
- regarding p examples: Both + and - negative examples make sense but I will always complain about giant walls of text to read.
task502_scruples_anecdotes_whoiswrong_verification.json
- regarding instructions: Seeing as how it's a anecdote of a complex ethical situation, might be hard to definitively answer who is right and wrong.
task503_scruples_anecdotes_isanswerable.json
- regarding instructions: Yes I have improve the study.
- regarding p examples: This study for the almost positively.
- regarding n examples: This is task not negatively.
- regarding instructions: the task is to offer an analysis of the speakers role in a situation. There is no mention of this at the outset. What is the goal of my opinion here? besides the general one of improvement, what sort. Peacekeeping, self awareness, etc?
task508_scruples_dilemmas_more_ethical_isidentifiable.json
- regarding p examples: I would prefer one or two more positive examples.
task510_reddit_tifu_title_summarization.json
- regarding instructions: There is a typo in the instructions: Should be title. "The tile should start with 'TIFU by"
- regarding p examples: There should be examples with less text that are more straightforward.
task513_argument_stance_classification.json
- regarding instructions: What if the stance is ambiguous or neutral?
task514_argument_consequence_classification.json
- regarding instructions: Theres barely any instructions for this, i feel like they should be fleshed out more.
- regarding p examples: Make them longer and give more information.
- regarding n examples: Theres not a lot to them, its pretty hard to understand how theyre negative.
task515_senteval_odd_word_out.json
- regarding instructions: Not really sure how I should know that a word has been changed. Is every unnatural sentence considered changed?
- regarding instructions: What if there's a spelling mistake that makes it appear changed?
- regarding p examples: Well, I don't know what resonated means. What do I do in that situation?
task517_emo_classify_emotion_of_dialogue.json
- regarding n examples: doenst know the emotions quickly
task518_emo_different_dialogue_emotions.json
- regarding p examples: I think you should space out the dialogues so they're easier to read.
task520_aquamuse_answer_given_in_passage.json
- regarding p examples: In the sentence "The passage does not mention anything about thanksgiving turkeys so the passage does not answer the question.", Thanksgiving should be capitalized.
task522_news_editorial_summary.json
- regarding instructions: Would want clearer instructions as to forming a thesis from the sentences. Can i use parts of sentences to create a whole new sentence? Can I just use one sentence?
- regarding p examples: Both + and - examples are giant walls of text
- regarding instructions: It doesn't say how many sentences we should extract or whether we can combine bits of different sentences.
- regarding p examples: Giant wall of text to read for both + and - examples
task539_spl_translation_ma_en.json
- regarding p examples: The output in example 2 in the positive examples is grammatically incorrect. "are agreed" should be "agreed".
- regarding p examples: some peoples doesnt know the exact translation.so focus on that point
task540_spl_translation_la_en.json
- regarding instructions: The instructions themselves make sense
- regarding p examples: There isnt any inputs, its blank.
- regarding n examples: There are no inputs for any of them
- regarding instructions: Ok, there is no Laotion sentence, it's blank, there is simply an English sentence, which we MIGHT infer was somehow translated. These are the very very worst of directions. I have never seen worse, to be honest. One includes only the number 100, but a complete sentence is generated from this number, strangely enough.s
- regarding p examples: The examples don't include Lao sentences so we would have no idea if it is or isn't... horrible.
- regarding n examples: NO examples offered when they are sorely needed. horrible, this is the only thing worse than leaving out part of the examples, omitting all together.
task541_spl_translation_kh_en.json
- regarding p examples: both + and - examples have either blank inputs and outputs or random numbers
task543_spl_translation_bh_en.json
- regarding instructions: I would reword the instructions to the following: In this task you will be given a sentence in the Bahasa language. Your job is to convert it into the English language.
- regarding p examples: The examples are just about as good as it gets. There could be more examples for clarity, but it is not highly necessary.
task545_spl_translation_fi_en.json
- regarding instructions: I feel like its very solid, I feel like I'd like if you specified if we have to speak Filipino or if we can use google translate.
- regarding instructions: The grammar can be improved. The sentence should be:<newline>"In this task, given a sentence in the Filipino language, your task is to convert it into the English language."
- regarding p examples: Examples 3 and 4 are blank. There should be examples here. Also examples should start with "Example 1" instead of "Example 0."
- regarding n examples: Examples should start with "Example 1" instead of "Example 0."
task546_spl_translation_bg_en.json
- regarding instructions: Well, I don't know bengali, but that doesn't matter, I can tell there are serious problems. The directions indicate another language but in task I see NO language in top section at all.
task547_spl_translation_entk_en.json
- regarding instructions: Very well organized with wording that simple and not over the top. I just don't understand what tokens refers to really. But after reading the examples I can work with it.
- regarding p examples: Very good as it is very easy to see why they are examples and why they make sense to me.
- regarding n examples: Very good, even a monkey could understand these. I feel like it could not have been shown to me any clearer than these.
task548_spl_translation_en_ch.json
- regarding p examples: both + and - examples outputs are either blank or only show a random string of numbers
- regarding p examples: example 2 has an output of some digits and its labeled as a good chinese sentence
task549_spl_translation_en_vi.json
- regarding instructions: the way they say about translation in the second part of examples it could be even simpler.
- regarding p examples: it was easy to understand, so i find it good.
- regarding n examples: the way they say about translation in the second part of examples it could be even simpler.
task550_discofuse_sentence_generation.json
- regarding instructions: The word "incoherent" isn't clearly defined. Off-topic? Contradictory?
task551_spl_translation_en_th.json
- regarding p examples: There is no output yet it says it's correctly translated.
task552_spl_translation_en_bu.json
- regarding instructions: While I think the insrtuctions are pretty good - they give a clear directive - it seems very minimal. Of course, you don't want the instructions to drag on, and there might not be anything to even add, really... I suppose, perhaps you could give tips or pointers in the instructions? You could give quick DOs and DONTs just to make sure everything is crystal clear.
- regarding p examples: I do understand that there's only so much you can say in the explanation examples that are correct, but I feel copy/paste text isn't the way to go about it. It could actually go into more depth to describe WHY the translation is correct. Also, Example 2 seems to have an error, as it starts with "Numbersixvalverde".
- regarding n examples: I think the quality on this one is pretty good, but it has the same issue as the Positives: the explanations copy/paste. It's hard to want to put effort in when someone didn't go through effort to provide reasonable explanations in the examples. Why didn't words match? What could be done differently? Things like that.
task554_spl_translation_en_la.json
- regarding p examples: No output us shown yet it says its a good example.
- regarding n examples: No output us shown yet it says its a bad example.
- regarding p examples: There are no outputs in the positive examples at all.
- regarding n examples: The negative examples are also missing outputs.
- regarding p examples: No output was shown yet it says it was a good example.
- regarding n examples: No output was shown yet it says it was a bad example.
task561_spl_translation_en_bg.json
- regarding p examples: There is nothing in the output section but it is saying they are good.
- regarding n examples: There is nothing in the output section but it is saying they are bad.
- regarding p examples: There was no output to comment on, just examples of feedback.
- regarding n examples: There was nothing to comment on in the output
task562_alt_language_identification.json
- regarding instructions: How can I be expected to make this work. I have no basis with Bengali, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar, Thai, Vietnamese or Chinese languages. Are you searching for workers that automatically recognize all of these languages?
- regarding p examples: The instructions are not complete. Many examples are not filled in. For instance positive example 2, 3 and 4 have no text.<newline><newline>Positive example 0 input is gibberish (Input: '' ! , $2.99 , - 99) How can the system provide an output with gibberish input? The instructions might be acceptable but the example is claiming a function that can not be.
- regarding n examples: Example 1: has a blank for input, but identifies a language. This means the system function is not clear.
- regarding instructions: Doesn't say how we should figure out which language is being shown.
- regarding p examples: example 0 is just numbers but it claims it's hindi, example 2 is just a dash but it says bengali.
- regarding p examples: example 0 shows a string of numbers and the output is labeled as hindi
task563_discofuse_answer_generation.json
- regarding instructions: Nothing I can think of.
- regarding p examples: I thought they were fine the way they are.
task564_discofuse_classification.json
- regarding instructions: Doesn't fully explain what each of the 13 discourse types are
task569_recipe_nlg_text_generation.json
- regarding instructions: Instructions are straightforward but would seem to be impossible to do unless you are a God of cooking
- regarding instructions: The task is simple, the directions should also be simple and they are, quite so.
- regarding p examples: Excellent examples are clear.
- regarding n examples: Watermelon smoothie is obviously wrong, this helps illustrate the point well.
task571_recipe_nlg_ner_generation.json
- regarding instructions: It was good explaining
- regarding p examples: it can be more clear
- regarding n examples: Its absolutely not clear
task573_air_dialogue_classification.json
- regarding instructions: It's fairly clear but the hit is complex, there are so many options, keeping them straight can only be done with a table so when you form the "hit" you must place the table near (on the page) where we are expected to put the answer.
task574_air_dialogue_sentence_generation.json
- regarding instructions: It's a bit confusing using the terms "input" and "output". I think it would be better if you just asked people to fill in the gap in dialogue.
- regarding p examples: The instructions were a little foggy, but the answers with the explanation helped clear things up quite a bit.
- regarding n examples: It explanation made sense as to why the answer was wrong.
task576_curiosity_dialogs_answer_generation.json
- regarding instructions: It's a bit long-winded. Could probably be simplified to "copy the answer to the first question in the dialog"
- regarding p examples: It's sort of hard to follow the examples as they are bots/non-english speakers
task577_curiosity_dialogs_classification.json
- regarding instructions: The instructions seemed pretty clear. Possibly a typo when describing "an user and an assistant". Should probably be "a user and an assistant". I could be wrong, but that sounds better.
- regarding p examples: I understood right away what was expected.
- regarding n examples: The negative part of the instructions were a bit confusing at first, I had to re-read them. After going over the examples a couple of times, it finally made since. Maybe try and make that part a little less complicated.
- regarding p examples: In the sentence "It is pretty straight forward, It is knowledge sharing, the assistant explains the question asked by the user", the word "It" in the middle of the sentence shouldn't be capitalized.
task578_curiosity_dialogs_answer_generation.json
- regarding instructions: I would remove " (focus_entity : Entity in focus in the dialogs)." as it is explained more clearly in the following sentence.
- regarding p examples: Huge block of texts to read for both the + and - examples.
task579_socialiqa_classification.json
- regarding instructions: Pretty good though I guess commonsense could mean different things to different people.
- regarding instructions: Commonsense might mean different things to different people.
task580_socialiqa_answer_generation.json
- regarding instructions: What if the context cannot be understood, or none of the answers are right?
- regarding instructions: What if none of the answers make sense?
- regarding n examples: Would "(A)" be a wrong answer (as opposed to "A")
task582_naturalquestion_answer_generation.json
- regarding instructions: Am I to Google answers where I really don't know? "Ronald Reagan Era" sounds like a time frame to me, since I don't know that musician's work.
task583_udeps_eng_coarse_pos_tagging.json
- regarding instructions: In the sentence starting with "The list of part-of-speech tags i.e", "i.e" should be "i.e.".
- regarding instructions: I feel the instructions are poorly formatted and long.
task584_udeps_eng_fine_pos_tagging.json
- regarding instructions: This task almost improved in knowledge.
- regarding p examples: this is most of them positively.
- regarding n examples: this is most of them positively.
- regarding instructions: Everything makes sense it's just a huge list of tags to remember.
task585_preposition_classification.json
- regarding instructions: It seems to have an easy explanation after seeing the examples, and not before it very much at first
- regarding p examples: They helped me determine what I should do and think
- regarding n examples: Self explanatory and loved how you need to know some knowledge too to get it right. But that can also be bad too.
task587_amazonfood_polarity_correction_classification.json
- regarding instructions: explanation of examples should state whether referring to polarity or output
- regarding p examples: Again, refer to polarity or output when explaining examples
- regarding n examples: Most important on these to explain if deficiency is in output or polarity
- regarding instructions: Instructions are illustrative here, and clear.
- regarding n examples: Detailed explanations are especially helpful here.
task588_amazonfood_rating_classification.json
- regarding instructions: Typo: given "aa" review; provide some words that are descriptive of the numbers. for example 1: horrible, 5: amazing/excellent, 4: good, 3: okay or mixed reactions, 2: bad. Encourage positives and negatives to be a 3 (mixed) or whichever emotion is stronger.
- regarding n examples: Mention that these negative examples are BAD, INCORRECT and not just a negative. In other words, "don't do this! These are wrong because..."
task589_amazonfood_summary_text_generation.json
- regarding instructions: I would only want a more detailed explanation of just what you want included in the short summary.
- regarding instructions: Would only want more instructions for the short summary. 5 words or less? A complete sentence? etc
task592_sciq_incorrect_answer_generation.json
- regarding instructions: Complete the empty examples with very simple things. Such as give an example as if a child could answer it, this is help to eliminate any confusion.
- regarding p examples: They might be too complex (especially the explanations parts) and some may find them confusing.
- regarding n examples: Same issue as the positive examples. Too complex for an example.
- regarding instructions: Overall explanation is clear. examples truly bear it out though.
- regarding p examples: examples are good., it offers clear ways to illustrate the task.
- regarding n examples: negative examples are easy because it's just a correct answer in this case, you explain it fully though.

Palipoor · 2021-10-28T19:58:16Z

@yeganehkordi I will pick up tasks 400 - 500.

yeganehkordi · 2021-10-28T21:32:28Z

@yeganehkordi I will pick up tasks 400 - 500.

Sounds good! I'll start from 501.

danyaljj · 2021-11-02T23:44:11Z

Feedback regarding tasks 600-850:

task607_sbic_intentional_offense_binary_classification.json
- regarding instructions: I'm not really sure what "non-offensive statements that are underhandedly offensive" means, an example would be nice.<newline>
task608_sbic_sexual_offense_binary_classification.json
- regarding n examples: Some of the language in the negative examples are REALLY offensive to read.
task609_sbic_potentially_offense_binary_classification.json
- regarding instructions: "Offensive language" seems a bit like a grey area that would mean different things for different people.
- regarding instructions: It saying input and output was confusing, why not just have it say what they are like "example" and "judgement" or something.
- regarding n examples: Examples were good, maybe a few more that aren't quite as simple
task611_mutual_multi_turn_dialogue.json
- regarding instructions: Its fine as is. The letters f and m are not capitalized in that part though unlike the other sentences, so that slightly confused me at first only.
- regarding p examples: Very good and easy to read. Could have been somewhat of a shorter story.
- regarding n examples: Shorter story would be enough.
task613_politifact_text_generation.json
- regarding n examples: The output for example 1 in the negative examples should be taxes because the input didn't mention small businesses.
- regarding instructions: Instead of "the sub-string" it would be more acceptable to use "a sub-string" in the last sentence. It would be helpful if the instructions included possible strings and explained whether an output could contain more than one word.
- regarding p examples: The explanations have some questionable grammar and are a bit unreadable. It would help if they didn't simply restate the input and output and instead offered more context. Also, it's not clear why the broader category of "military" was chosen for example 2 instead of "weapons" or "nuclear".
- regarding n examples: The explanations are very unreadable, especially example 3. It's not clear if the entire phrase "legal issues, supreme court" is the desired output.
task614_glucose_cause_event_detection.json
- regarding p examples: I definitely think you should have more examples. I don't think that 2 are enough.
- regarding n examples: As with the positive examples, I think you need to provide more than 2.
task615_moviesqa_answer_generation.json
- regarding instructions: A little bit more than a couple sentences would help so I know I'm doing it properly. You want me giving a rating for a movie, okay what website am I using, IMDB, Rotten Tomatoes, something else? Am I just looking at user reviews or critics or both? Do I just provide 1 rating and that's it or put it into a sentence?
- regarding p examples: Do I just provide the answer or do I need a bit more? Like if it says it's a war movie, should I just say "war" or write "X movie is a war movie about such and such"
task616_cola_classification.json
- regarding instructions: Way too much leeway in determining whether a sentence is acceptable.
task617_amazonreview_category_text_generation.json
- regarding instructions: These should have example categories; also they should explain the syntax (why is an underscore used? Is it required for all categories using two words?)
- regarding p examples: The examples are clear, but it's not clear why an underscore is being used in 'home_improvement'.
- regarding instructions: Very good at pointing out what to do and encouraging criticalness too.
- regarding p examples: I saw some typos and that hyphen in between two words that really doesn't belong there but overall otherwise it makes good sense.
- regarding n examples: I didn't see anything to critique about there.
- regarding instructions: The instructions should list all of the possible categories.
task618_amazonreview_summary_text_generation.json
- regarding instructions: Instructions specify Amazon food products though neither task of reviews to be summarized are for food.
- regarding n examples: Layout design error, sentences hanging outside the tinted area. (Perhaps it's my screen?)
task625_xlwic_true_or_false_answer_generation.json
- regarding instructions: Directions make sense. Just reading quickly I get the entire gist of it. No problem. That's the key to good instructions NOT being overly wordy but being clear. You did this.
- regarding p examples: examples make sense.
- regarding n examples: Examples work in each case.
task628_xlwic_word_with_different_meaning_sentence_generation.json
- regarding instructions: Don't need to have examples in instructions. Doesn't say what to do if selected word has only 1 meaning
- regarding instructions: Doesn't say what to do if a word doesn't have multiple meanings.
task630_dbpedia_14_classification.json
- regarding instructions: It's not that clear what to do when there's additional text in the title, other than in brackets, but the main portion of the title is present in the text.
- regarding instructions: It's clear that text in brackets can be ignored, but what about other text, like what if there's a semi colon, or what if the entity in the text is part of a longer title?
task632_dbpedia_14_classification.json
- regarding instructions: These could use some clarification about tasks that concern a person, but the person isn't the main subject of the text (like a text about a book, that mentions the name of the author)
task633_dbpedia_14_answer_generation.json
- regarding instructions: Simplify 'You are given a question and options. You are expected to choose from the options and output the number of that option' to 'pick the correct number'
task641_esnli_classification.json
- regarding p examples: There are some typos but they don't affect understanding of the task. More importantly, there are no examples of using the 'N' option.
- regarding n examples: There are no examples using the 'N' option.
- regarding p examples: Could use a couple more examples, especially for a natural case.
- regarding n examples: Could use a couple more examples, especially for a natural case.
task642_esnli_classification.json
- regarding p examples: Examples"<newline>Ex 1 is understandable given both refer to a boy wearing a blue jacket. We assume it's the same boy agree/disagree = yes. Ditto with the logic on Ex 2, makes sense. However, I couldn't determine in the tasks below: Is it the same man/forearms being described? The same woman bowling/sleeping? So I answered No to both for I couldn't clearly determine if the actors were the same. (note typo, 'collared' shirt, in first task below.) Do forearms in both sentences indicate a yes? Very little to hang it on in the second sentence. {{My head is spinning, lol}}
task649_race_blank_question_generation.json
- regarding instructions: The instructions are very brief. It would help to know if there are guidelines that must be followed when completing the task.
- regarding p examples: The examples are fine but fill in the blank type questions aren't usually about filling in a phrase or sentence but more about just filling in a word. Your examples want people to fill in more than just one word, if that's the case you have to explain that.
- regarding n examples: All the examples are way too long. The entire article or passage doesn't need to be so huge.
task664_mmmlu_answer_generation_abstract_algebra.json
- regarding p examples: Unnecessary line break in example 2
- regarding instructions: It's good. Already your instruction was so good .
- regarding p examples: Your example is very perfect.
- regarding n examples: You are clear every doubt
task665_mmmlu_answer_generation_anatomy.json
- regarding instructions: it was easy to understand.
- regarding p examples: it was easy to understand.
- regarding n examples: it was easy to understand.
- regarding p examples: If all you're looking for is just a multiple choice type answer then the examples are good
task666_mmmlu_answer_generation_astronomy.json
- regarding instructions: Am I only supposed to just choose a letter even if it's not the right answer? There didn't seem to be a right answer for the questions provided, the second one just listed evidence for the Giant Impact Hypothesis.
- regarding p examples: The examples were fine but they were nothing like the questions I had to do
- regarding n examples: The examples were fine but they were nothing like the questions I had to do
- regarding instructions: Doesn't say what to do if multiple answers are plausible.
task667_mmmlu_answer_generation_business_ethics.json
- regarding instructions: The instructions could explain that there will be multiple fields per answer.
- regarding p examples: I felt confident that I understood the task following the examples, but I'd recommend highlighting or otherwise emphasising the answers in the explanation text.
- regarding instructions: it could have been simplified even more.
- regarding p examples: it could have been simplified even more
- regarding n examples: it could have been simplified even more
task668_extreme_abstract_summarization.json
- regarding instructions: Should have specifics on how long the summary should be and what it must include
- regarding instructions: Needs better instructions on generating a summary. Can it be a short phrase, multiple sentences, etc
task669_ambigqa_answer_generation.json
- regarding instructions: It might be a good idea to clarify that the response should be brief and not in the form of a sentence.
- regarding p examples: England, Washington, and DC should be capitalized.
- regarding n examples: Typos:<newline>ad (should be and)<newline>Washigton<newline>conntrol
task670_ambigqa_question_generation.json
- regarding p examples: The outputs in both examples 1 and 2 have grammatical errors. "elaborate" should be "elaborates". The output in example 2 is incoherent or grammatically incorrect.
- regarding n examples: "ask" in the output of example 1 should spell "asks".
- regarding instructions: Some parameters on what we can or cannot include would improve clarity.
task671_ambigqa_text_generation.json
- regarding instructions: The last sentence in the instructions is incomplete:"....should be separated by", by what?
- regarding n examples: "elaborate" in the explanation in examples 2 and 3 should spell " elaborates".
task672_nummersense.json
- regarding instructions: doesn't make sense at all. based on the examples I think the instructions should be changed to 'answer the word problem with a word'
task673_google_wellformed_query_classification.json
- regarding instructions: The instructions could delve a little deeper into what exactly you're looking for. It's a big vague.
- regarding n examples: The examples are just the flipped versions of the positive ones.
- regarding instructions: If the query is well formatted and spelled correctly, but is wrong ("the sky is purple"), is it still "Good"?
task675_google_wellformed_query_sentence_generation.json
- regarding instructions: The description of what you're doing in the task could be written in a better way with less words. I would tell people they need to find the query from the list that is the most well structured and formed. The way the instructions were written made you reread them a few times to really understand what was being asked.
- regarding p examples: The examples are pretty good if you understand the instructions well. The explanation of the examples is simple which keeps it easy to understand.
- regarding n examples: The examples make sense if you understand the instructions. Once the instructions are written a little better, you will understand exactly what the examples mean the first time. The explanations are simple but they should be because the instructions are simple.
- regarding instructions: It says I'm given a set of queries separated by ''. I don't know what that is referring to, it looks unnecessary. Other than that the instructions are clear.
- regarding p examples: Your very first example you chose said it was free of spelling and grammar errors but it wasn't, Haiti wasn't capitalized. I also don't know why there's a space before the question marks, that shouldn't be there.
- regarding n examples: ``
task686_mmmlu_answer_generation_college_biology.json
- regarding p examples: examples based upon people knowing things that few turkers would know offhand
- regarding n examples: examples based upon people knowing things that few turkers would know offhand
- regarding instructions: maybe the need to be instructions clearer. Am I supposed to know the correct answer. Are we supposed to just answer (A,B,C,D) or our we supposed to type out the answer too?
- regarding instructions: A lot of these are now similar and perfect and I hope that doesn't flag me for not typing much feedback
task688_mmmlu_answer_generation_college_computer_science.json
- regarding p examples: Example 1 says coulb instead of could.
- regarding instructions: It would be helpful to include how we will know which option is correct: whether purely through common knowledge or through a specific reference source. Also, it would be good to know what one should do if there is unclear language or are typos.
- regarding p examples: There's a typo in the Explanation of Example 1. It would also be helpful if the explanations were less vague. In both cases, the example explanations seemed to just restate the question and answer.
- regarding n examples: Both negative examples were undesirable due to rejected output formatting. It would be helpful to have a negative example where the answer/content makes it incorrect if that is something that can happen.
- regarding instructions: It says "You need to answer the question by selecting the correct option." This is vague, what is the correct option? Choosing the actual answer or just choosing a letter or is it something else?
- regarding p examples: ``
- regarding n examples: If the examples are meant to be answered correctly and not just pick a letter, then there should be examples of choosing a letter but it being the wrong one.
task690_mmmlu_answer_generation_college_medicine.json
- regarding p examples: Both + and - examples have sports questions embedded despite the instructions having to do with college medicine
- regarding p examples: examples based upon people knowing things that few turkers would know offhand
- regarding n examples: examples based upon people knowing things that few turkers would know offhand
task692_mmmlu_answer_generation_computer_security.json
- regarding instructions: Doesn't say what to do on the off chance that two of the answers are plausibly correct.
task693_mmmlu_answer_generation_conceptual_physics.json
- regarding n examples: "Muliple" in the output of example 2 in the negative example is misspelled. It should be "Multiple".
- regarding instructions: maybe the need to be instructions clearer. Am I supposed to know the correct answer. Are we supposed to just answer (A,B,C,D) or our we supposed to type out the answer too?
task694_mmmlu_answer_generation_econometrics.json
- regarding instructions: At first I thought I was grading the answer itself. How do I know what the correct option is
task695_mmmlu_answer_generation_electrical_engineering.json
- regarding instructions: the explanation is easy to understand.
- regarding p examples: the explanation is easy to understand
- regarding n examples: the explanation is easy to understand
task697_mmmlu_answer_generation_formal_logic.json
- regarding instructions: Would need a brief explainer on what exactly formal logic is.
task698_mmmlu_answer_generation_global_facts.json
- regarding instructions: The instructions could be more brief and written in a more readable way. I.e. your task to select the correct multiple choice answer from the question shown below.
- regarding p examples: The example was good but it would be nice to reiterate why the positive example is positive. If you read the positive example first, you won't really understand the task, until you read the negative example.<newline><newline>
- regarding n examples: The examples are clear. However, there is a typo. "Muliple options can not be correct." Cannot is one word.
task699_mmmlu_answer_generation_high_school_biology.json
- regarding instructions: It can be more clear
- regarding p examples: Its more clear
- regarding n examples: It can be better
task700_mmmlu_answer_generation_high_school_chemistry.json
- regarding instructions: It would be helpful if the instructions included how we determine which answer is correct whether it's via common knowledge, verification from a specific source, or something else.
- regarding p examples: Example 2 is missing a question. The answer given seems to conflict with the information present in the input.
- regarding n examples: If it's possible for an answer to be incorrect based on content instead of just formatting, it would be helpful to include this as an example. If the output was B for Example 1, would it be incorrect and why?
- regarding instructions: None. Easy to understand.
- regarding p examples: None. Easy to understand.
- regarding n examples: None. Easy to understand.
task702_mmmlu_answer_generation_high_school_european_history.json
- regarding instructions: It's not entirely clear if answering these questions depends on existing knowledge, or just information contained in the texts.
- regarding instructions: It's not clear if generating these answers depends just on knowledge in the passages, or if they require existing knowledge.
task703_mmmlu_answer_generation_high_school_geography.json
- regarding instructions: Instructions are clear. The questions are straightforward enough.
- regarding p examples: The examples help a lot, they give good scope, because I was worried when you mentioned geography.
- regarding n examples: Examples take advantage of using positive sometimes and reversing, which is good.
task705_mmmlu_answer_generation_high_school_macroeconomics.json
- regarding instructions: It would be helpful if the instructions clarified whether there is one specific right answer or many possible right answers. Also, it would be helpful to explain if we should know the answers as common knowledge or if we should be consulting a specific source to verify our answers are correct.
- regarding p examples: Although we are shown the desirable output, it would be helpful to have a further explanation why it is the desirable output. It's not completely clear if the desirable output is acceptable only because it fits the desired format or because the answer is inherently correct.
- regarding n examples: These examples were undesirable due to the format of the output. If it is likely to occur, it would be helpful to provide a negative example where the format of the output is correct but the content is not.
- regarding instructions: Instructions imply I have high school level knowledge of macroeconomics.
task706_mmmlu_answer_generation_high_school_mathematics.json
- regarding instructions: The instructions are okay but if you really want more than just a one letter response to a mathematics questions then it needs more than that.
task709_mmmlu_answer_generation_high_school_psychology.json
- regarding instructions: Instructions are basic, but fine.
- regarding p examples: poor examples, one is psychologic but the second is regarding eyes, an actual Medical issue.
- regarding n examples: poor examples, one is psychologic but the second is regarding eyes, an actual Medical issue.
task711_mmmlu_answer_generation_high_school_us_history.json
- regarding instructions: Well, the examples are good, but they also seem a little like common sense. I mean, I'm assuming we're supposed to also use the writing, but it's a little difficult to find that exact information in the writing.
- regarding p examples: I wish the directions were a little more specific. Like, only take your answer from the writing, or use the implied meaning/assumptions from it.
- regarding n examples: Well, they also seemed kind of like common sense. I'm not sure I know how they can be improved. Maybe
task713_mmmlu_answer_generation_human_aging.json
- regarding instructions: It's okay but what if someone doesn't have a clue what the topic ensues, is, etc.?
- regarding p examples: Didn't see any issues.
- regarding n examples: Multiple and Do is spelled wrong or in the wrong way.
task714_mmmlu_answer_generation_human_sexuality.json
- regarding p examples: Is breast cancer really related to human sexuality?
- regarding p examples: Not sure if breast cancer falls under the umbrella of human sexuality.
task716_mmmlu_answer_generation_jurisprudence.json
- regarding instructions: Instructions are fine enough. I like that they are straightforward and simple.
- regarding p examples: These are tough. I guess they are ok, but I don't go to law school so can't really be sure.
- regarding n examples: Difficult and hard, I'm not certain my answers are right, I did my best though.
- regarding instructions: It can be explained a little simplier
- regarding n examples: These made me really understand the instructions
task718_mmmlu_answer_generation_machine_learning.json
- regarding instructions: Instructions presume knowledge that most people (especially on mturk) won't have
- regarding p examples: examples presume knowledge that most people (especially on mturk) won't have
- regarding n examples: examples presume knowledge that most people (especially on mturk) won't have
task720_mmmlu_answer_generation_marketing.json
- regarding instructions: It's not clear if there is only one correct answer or if any one of several answers might be correct. Also, the instructions don't clarify which sources can be used to verify or research correct answers.
- regarding p examples: The explanation for Example 1 is a sentence fragment and seems to be missing something.
- regarding n examples: If it's possible for an answer to be negative based on content instead of just formatting, then it would be helpful to include an example of this!
task722_mmmlu_answer_generation_miscellaneous.json
- regarding instructions: Maybe replace 'miscellaneous' with 'random topic'
task723_mmmlu_answer_generation_moral_disputes.json
- regarding instructions: The instructions are good, but they could inform us where we're supposed to reference for the answers.
- regarding p examples: Fine, but could be improved by providing a source for the correct answer.
- regarding n examples: Good, because it demonstrates the proper outputs via negative examples.
- regarding instructions: Instructions presume knowledge that most people (especially on mturk) won't have
- regarding p examples: examples presume knowledge that most people (especially on mturk) won't have
- regarding n examples: examples presume knowledge that most people (especially on mturk) won't have
task724_mmmlu_answer_generation_moral_scenarios.json
- regarding instructions: Simple and easy to get.
- regarding p examples: I saw a typo with does/doesn't in mind
- regarding n examples: I saw a typo with Do/does related phrasing
- regarding n examples: They should have provided examples with an incorrect selection, rather than just instances that showed answers that didn't accord with the instructions.
task726_mmmlu_answer_generation_philosophy.json
- regarding n examples: You should probably say what to actually do if you don't know
task727_mmmlu_answer_generation_prehistory.json
- regarding instructions: I was able to understand it without any problems.
- regarding p examples: Give a better understanding of why the output was chosen.
- regarding n examples: It is understandable why they examples were negative and does not really much else explaination.
task728_mmmlu_answer_generation_professional_accounting.json
- regarding p examples: Example 1 explanation is not a complete sentence.
task729_mmmlu_answer_generation_professional_law.json
- regarding p examples: Examples are very long, unformatted blocks of text. Improve readability. First example doesn't give good explanation; too vague compared to the second.
- regarding n examples: Examples are very long, unformatted blocks of text. Improve readability.
task730_mmmlu_answer_generation_professional_medicine.json
- regarding p examples: Examples are difficult to understand without a knowledge of medicine.
task731_mmmlu_answer_generation_professional_psychology.json
- regarding instructions: No issues were identified
- regarding p examples: Examples were easy to understand and straightforward.
- regarding n examples: Examples were easy to understand and straightforward.
task732_mmmlu_answer_generation_public_relations.json
- regarding instructions: The only thing I would suggest is specifying that you can only choose one option.
- regarding p examples: In Example 1, the explanation of the correct answer does not explain why it is the correct answer. This would probably be beneficial for some to know. Also, there is a comma after "York" before market that doesn't need to be there, I don't think.
- regarding n examples: Again, maybe a better explanation of why Example 1's answer is B.
task734_mmmlu_answer_generation_sociology.json
- regarding instructions: The instructions say to "select the correct option", which would be more appropriate in a bubble selection format. Since this tasks requires one to actually type the letter corresponding to the correct answer perhaps this could be reworded.
- regarding p examples: The positive examples are okay though they only specifically address the information in the text, not why the format of the answers is correct.
- regarding n examples: The negative examples to a good job of showing what type of inputs are incorrect.
task735_mmmlu_answer_generation_us_foreign_policy.json
- regarding p examples: The word "correct" is spelled as "currect".
task736_mmmlu_answer_generation_virology.json
- regarding p examples: I'm not sure if the average person is expected to know these correct answers without doing any additional research, but if they are expected to know these answers then the questions are pretty tough.
- regarding n examples: ``
task738_perspectrum_classification.json
- regarding p examples: claim/perspective formatting is a bit sloppy, could be improved
- regarding n examples: claim/perspective formatting is a bit sloppy, could be improved
task743_eurlex_summarization.json
- regarding instructions: The instructions seem simple but what you have to do is not simple at all
- regarding p examples: These are just bad examples, why do you need to provide the whole article and not just examples of what is being talked about in it? If there's a summary then you can write a headline.
- regarding n examples: Same goes for this as well, writing articles for the headline requires knowledge about anything legal related and I don't have that knowledge, I'm trying to improve the structure and grammar and instructions of your HIT and these examples are just too much, keep it simpler when it needs to be and in this case it needs to be.
- regarding instructions: good
- regarding p examples: words can be extra
- regarding n examples: it is good
task744_eurlex_classification.json
- regarding instructions: Generally speaking, the instructions are difficult to understand without a background in law, but it is good that the 3 categories were elaborated on. I'd recommend formatting and color to make the text more readable.
- regarding p examples: Improved formatting would be helpful.
- regarding n examples: Improved formatting would be helpful.
- regarding p examples: huge walls of text to read through for both + and - examples
task745_ai2_arithmetic_questions_arithmetic.json
- regarding instructions: The task definition was not clear.
- regarding n examples: I can't understand negative examples very easily.
- regarding p examples: Example 2 has a couple typos or grammatical errors ( 'she gave' with she not capitalized; '27 seashell'). They didn't affect understanding the question though.
task747_glucose_cause_emotion_detection.json
- regarding instructions: Doesn't state what to do if a nonsensical story is presented
task748_glucose_reverse_cause_event_detection.json
- regarding instructions: Doesn't state what to do if a story is illogical, unrelated, difficult to understand etc
- regarding instructions: Not told what to do in the event of the selected story being nonsensical or difficult to understand.
- regarding instructions: Does not state what to do if the selected sentence doesn't follow the rest of the story.
task750_aqua_multiple_choice_answering.json
- regarding instructions: This is your best one yet I'd say, the instructions are simple but clear and easily understood, and it shines through into the examples.
- regarding instructions: Use instructions that a layperson who's not a math genius can understand.
- regarding p examples: I'm terrible at math and had a difficult time following the variables.
- regarding n examples: I did not understand what made the incorrect examples wrong.
task751_svamp_subtraction_question_answering.json
- regarding instructions: The instructions are phrased a bit too formally, and could use pauses or sentence breaks to help make them understandable. The phrase "apply subtraction mathematical operator on the numbers embedded in the text" is missing a "the" after "apply", but instead probably needs to be rephrased to something simpler such as "use subtraction".
- regarding p examples: There's a typo in example 1, Sam is, not has, 6 feet tall. <newline>In example 2, usually, the book series title would be capitalized unless it is styled with lowercase letters. <newline>In example 3, there is a small typo, a space in "don't".
task752_svamp_multiplication_question_answering.json
- regarding instructions: This is kind of confusing, "multiplicative mathematical operator on the numbers" , You could give some quick examples.
- regarding p examples: You could give a general idea what you want such as "keep the sample exactly like this format" or "keep the explanation as short as possible".
- regarding n examples: You could add a reason at the very bottom why specifically the example is bad.
- regarding instructions: The instructions make it clear that the product will involve multiplication, but not that there might be other steps involved (like division).
- regarding p examples: ``
- regarding n examples: Example 1 uses division, yet the instructions only reference multiplication
- regarding instructions: The instructions only mention multiplication; they don't mention that some of the problems might involve other mathematical operations (like division)
- regarding n examples: Q1 partially relies on division
task753_svamp_addition_question_answering.json
- regarding instructions: "Mathematical operator" could probably be simplified
task754_svamp_common-division_question_answering.json
- regarding instructions: The instructions themselves did not provide much information on the task expectation, apart from critiquing the instructions.
- regarding n examples: The negative example explanations could have provided a bit more detail.
task827_copa_commonsense_reasoning.json
- regarding n examples: Could use a few more examples.
task828_copa_commonsense_cause_effect.json
- regarding n examples: Example 2 is a little less than emphatic, it only says that the provided answer is wrong, it doesn't say that the other answer (cause) would be correct in this case.
- regarding instructions: Had to google what newline character is. Simplify it to linebreak.
task835_mathdataset_answer_generation.json
- regarding instructions: I am not very strong in math abilities so I had a hard time following the instructions.
- regarding p examples: I couldn't follow the variables since I found the math confusing.
- regarding n examples: It was hard for me to see what made the negative examples wrong.
task843_financial_phrasebank_classification.json
- regarding instructions: I didn't see any errors or typos, wierd phrasing, etc.
- regarding p examples: I find this to be self explanatory, etc. so it's good
- regarding n examples: I love the way this was all written
- regarding n examples: The term 'operating profit' is spelled 'operatinng profit'.
task844_financial_phrasebank_classification.json
- regarding instructions: Doesn't explain what polarity is in the context of the instructions
task846_pubmedqa_classification.json
- regarding p examples: both + and - examples imply I am a doctor or scientist. difficult to understand.
task849_pubmedqa_answer_generation.json
- regarding instructions: Instructions are too vague and provide no clear information on how to complete the task.
- regarding p examples: Virtually impossible to understand, I would have returned the hit if not for the initial debriefing explaining that this was an instruction rating, moreso than output generation.
- regarding n examples: Same as previous field, the examples cannot be understood without prior understanding of the subject matter or better instructions.
task850_synthetic_longest_palindrome.json
- regarding instructions: "If the shortest possible palindrome is length 1 you should return the first character." This could be reworded to something like "If the shortest palindrome is only the length of one character, then write just that character. As this part was strange to understand and I only understood by reading through the examples.

Palipoor · 2021-11-04T17:09:00Z

@yeganehkordi I pick 600 - 700.

…nsion � Conflicts: � .gitignore

danyaljj · 2021-11-07T21:55:31Z

Feedback re. tasks 850-1200:

task1087_two_number_sum.json
- regarding instructions: pretty simple and straightforward
task1088_array_of_products.json
- regarding instructions: Absolutely does not make sense in any way until you see the examples and not even then<newline><newline>Too much calculator work - I wouldn't do this HIT unless the pay is stellar and if there are only 2-3 numbers to multiply. The examples below are TOO LONG and I'm only doing them out of spite because I've already wasted enough time typing all of this.
- regarding p examples: answer[i] is eqaul to the product of all the elements of nums except nums[i] where i is the position of the element in the array, otherwise it is undefined at this point<newline><newline>no... you have to treat examples like we're stupid. You have to go through the actual mathematical steps to show how 840, etc., was reached. These are the ONLY EXPLANATIONS WE WILL GET so you HAVE to be comprehensive.
- regarding n examples: see above.
- regarding instructions: math is hard
task1135_xcsr_en_commonsense_mc_classification.json
- regarding n examples: "finishing was whaat"
- regarding p examples: the explanations/output were not in line with common sense answers and need to be improved
task1148_maximum_ascii_value.json
- regarding instructions: there needs to be a chart of what ascii values are
- regarding p examples: how do we know with nothing to go by
- regarding n examples: no way to know with a list of the values
task1150_delete_max_min.json
- regarding instructions: no clue what the max & min elements are
- regarding p examples: it explains/shows what the max & min are
- regarding n examples: it explains the max/min and order well
task1152_bard_analogical_reasoning_causation.json
- regarding instructions: I'm assuming everyone knows what analogies are. The instructions are fine.
- regarding p examples: Is there only one word you're looking for? This seems like an ambiguous task - these are more synonyms than anything. Do we have to write an explanation?
- regarding n examples: Is there only one word you're looking for? Do we have to write an explanation? These are more synonyms than analogies.
- regarding instructions: instructions make sense, no issues
- regarding p examples: examples give good explanation on what to do
- regarding n examples: examples are good at showing what is not expected, complements positive examples
task1153_bard_analogical_reasoning_affordance.json
- regarding instructions: examples make sense to show what is wanted
- regarding n examples: bad examples are obvious on what doesn't work
- regarding instructions: doesn't explain what an affordance is
- regarding p examples: some of the words could have many things, so not sure what all is acceptable
- regarding n examples: could have more and diverse examples
task1154_bard_analogical_reasoning_travel.json
- regarding p examples: Explanation in all the three positive examples should use the word "output" instead of "answer" in the last sentence to make it more clear.
task1155_bard_analogical_reasoning_trash_or_treasure.json
- regarding instructions: a bit convoluted
- regarding p examples: seems like some things could be subjective and that's not covered
- regarding n examples: subjectivity not discussed
task1158_bard_analogical_reasoning_manipulating_items.json
- regarding instructions: not clearly defined some guesswork involved
task1159_bard_analogical_reasoning_containers.json
- regarding instructions: IT WAS GOOD.
- regarding p examples: IT WAS VERY CLEAR.
- regarding n examples: GIVE THE CORRECT WORD ALSO WITH THE EXAMPLE.
task1186_nne_hrngo_classification.json
- regarding instructions: It's missing an ending punctuation mark.
- regarding p examples: Your HTML/Markdown formatting is messed up.<newline><newline>All of these numbers and explanations are totally opinion. Like you can justify any number. <newline><newline>Are we also expected to provide an example?
- regarding n examples: Your HTML/Markdown formatting is messed up.<newline><newline>See above
- regarding p examples: the example do not detail what to do on odd typos, like the "-s" used in input task 1. do we ignore issues like that or make it part of the evaluation
task1189_check_char_in_string.json
- regarding n examples: Explanation for negative example 1 is wrong because the output is correct as I is present in the string.
- regarding instructions: The comma MUST be offset by a space, otherwise the desired character is VERY HARD TO SEE<newline><newline>In fact why can't you put the desired character on a separate line? Like, why are you making this deliberately difficult for workers?
- regarding p examples: n/a
- regarding n examples: n/a
task1190_add_integer_to_list.json
- regarding instructions: explained well, no issues
task1191_food_veg_nonveg.json
- regarding instructions: unless you're well versed in Indian food or Hindi, the instructions are useless
- regarding p examples: pork is obvious, but the other dish is impossible without google unless you're Indian
- regarding n examples: not possible to do without googling dishes or words
- regarding instructions: This requires too much outside research. At the very least you should pre-program a google search with the term of the dish.
- regarding p examples: I mean it's pretty straight forward, although it's not clear whether we need to also provide a written explanation.
- regarding n examples: I mean it's pretty straight forward, although it's not clear whether we need to also provide a written explanation.
task1192_food_flavor_profile.json
- regarding p examples: include the ingredients of the flavor profile
task1195_disflqa_disfluent_to_fluent_conversion.json
- regarding instructions: It is hard but doable to define the objective answer.
task856_conv_ai_2_classification.json
- regarding p examples: Markdown formatting isn't working.
- regarding n examples: Markdown formatting isn't working.
- regarding instructions: instructions clear, makes sense
task862_asdiv_multidiv_question_answering.json
- regarding p examples: Example 1 is kind of weird, how is someone going to use a quarter of a paper bag? Doesn't make sense.
- regarding p examples: explains what is correct well
task863_asdiv_multiop_question_answering.json
- regarding n examples: Example 2 has some grammatical errors in the explanation.
task866_mawps_multidiv_question_answering.json
- regarding instructions: Nothing, it's very clear
- regarding p examples: They're great
- regarding n examples: They're great
task868_mawps_singleop_question_answering.json
- regarding instructions: I don't understand the difference between a good and a bad answer. Some kind of text explanation would be helpful, not just examples. You're just making us guess if you don't spell it out.
- regarding p examples: There should not be a space before punctuation marks.
- regarding n examples: There should not be a space before punctuation marks.<newline>You are missing a "the" before "correct answer".
task892_gap_reverse_coreference_resolution.json
- regarding instructions: Explanations in Example 1 of positive examples and both examples in negative examples are confusing and incorrect.
- regarding p examples: Output in example 1 should be "he" because the pronoun is to replace "Thomas" after a comma.
- regarding n examples: Negative example 1 should be correct as "he" appears in the last sentence, which should refer to "Fritzsch".<newline>Negative example 2 should be correct as "he" referring to Elias appears in the first sentence.
task927_yelp_negative_to_positive_style_transfer.json
- regarding p examples: Could have explained a little more about changes. For example in ex. 2, there were two changes, not just the one.
- regarding n examples: Generally fine but could have another couple of examples to really drive the point home.
- regarding instructions: I think overall it these are solid. I think that it is easy to read.
- regarding p examples: The examples are basically perfect. Shows exactly what they want.
task928_yelp_positive_to_negative_style_transfer.json
- regarding n examples: The negative examples are a little more jumbled but still good.
task933_wiki_auto_style_transfer.json
- regarding instructions: I think instructions about parentheses should be added because some people might just omit these clauses automatically.
- regarding p examples: I think that for Example 1's output, there were a few spacing typos before the commas which should be removed.
- regarding n examples: I think providing an example with a longer sentence would be helpful.
task936_defeasible_nli_snli_classification.json
- regarding instructions: no issues, they detail what is wanted
- regarding p examples: no issues, details what was wanted well
- regarding n examples: they explain what isn't wanted well, no issues with understanding what is negative
- regarding instructions: So... you literally just want us to say 'strengthener' or 'weakener'? That's the entire HIT? I'm confused.
- regarding p examples: The HTML/Markdown formatting appears to be broken here. Missing periods at the ends of sentences.
- regarding n examples: The real HITs are going to be more ambiguous if past work from this requester is any indication. Are you ABSOLUTELY SURE this is the only way we can mess up?<newline><newline>Okay I've just done the examples. This makes NO SENSE. How can eating out of a takeout container either strengthen or weaken the scenario? Couldn't you say either? What does that even mean to weaken/strengthen the situation????
task938_copa_hi_commonsense_reasoning.json
- regarding instructions: The instructions were clear
- regarding p examples: I do not think that these are good examples
- regarding n examples: No, I cannot understand this
task939_copa_hi_commonsense_cause_effect.json
- regarding p examples: There seems to be no relation between the input and the output which renders the examples useless
- regarding n examples: There seems to be no relation between the input and the output which renders the examples useless
task941_copa_gu_commonsense_cause_effect.json
- regarding instructions: I do not know the language that they are asking about but the instructions themselves are pretty clear.
- regarding p examples: No idea what these examples are referring to
- regarding n examples: These are not clear at all.
task942_copa_mr_commonsense_reasoning.json
- regarding instructions: Makes connections between mathematical concepts, between concepts and procedures, or between concepts, procedures, and application. Prompts cognitive effort. Is problem-based, authentic, or interesting.
- regarding p examples: Being happy even when you have little. Having a good time even when you are losing.
- regarding n examples: While employers may not love the idea of having an employee who is preoccupied with the finer points, a candidate who assures quality and strives for balance can be a great asset.
task945_wiki_cloze_bn_multiple_choice_question_answering.json
- regarding instructions: pretty simple and straightforward
- regarding p examples: There seems to be no relation between the input and the output which renders the examples useless
- regarding n examples: There seems to be no relation between the input and the output which renders the examples useless
task946_wiki_cloze_gu_multiple_choice_question_answering.json
- regarding instructions: Makes connections between mathematical concepts, between concepts and procedures, or between concepts, procedures, and application. Prompts cognitive effort. Is problem-based, authentic, or interesting.
- regarding p examples: Being happy even when you have little. Having a good time even when you are losing.
- regarding n examples: Common strengths include independence, persistence, creativity, and ingenuity. Common weaknesses include procrastination, impatience, impulsiveness, and forgetfulness.
task947_wiki_cloze_hi_multiple_choice_question_answering.json
- regarding p examples: the relation between the input and the output is not clearly explained (in fact the output is blank) and thus the explanation is worthless
- regarding n examples: the relation between the input and the output is not clearly explained (in fact the output is blank) and thus the explanation is worthless
- regarding instructions: Makes connections between mathematical concepts, between concepts and procedures, or between concepts, procedures, and application. Prompts cognitive effort. Is problem-based, authentic, or interesting.
- regarding p examples: Being happy even when you have little. Having a good time even when you are losing.
- regarding n examples: While employers may not love the idea of having an employee who is preoccupied with the finer points, a candidate who assures quality and strives for balance can be a great asset.
task948_wiki_cloze_kn_multiple_choice_question_answering.json
- regarding instructions: The Instructions said that I am given a statement written in Kannada, but none of the inputs in the examples is in Kannada. I have no idea what is happening here.
- regarding p examples: None of the inputs in the positive examples is written in Kannada, which is contradictory to the instructions given at the beginning.
- regarding n examples: None of the inputs in the negative examples is written in Kannada, which is contradictory to the instructions given at the beginning.
task951_wiki_cloze_or_multiple_choice_question_answering.json
- regarding p examples: there is no output. therefore the explanation is meaningless and the example is worthless.
- regarding n examples: there is no output. therefore the explanation is meaningless and the example is worthless.
task954_wiki_cloze_te_multiple_choice_question_answering.json
- regarding p examples: there is no output. therefore the explanation is meaningless and the example is worthless.
- regarding n examples: there is no output. therefore the explanation is meaningless and the example is worthless.
task958_e2e_nlg_text_generation_parse.json
- regarding instructions: the formatting for the instructions is a mess. The explanation on the ordering of items needs to be separated from the rest of the paragraph for ease of reading
task959_e2e_nlg_text_generation_identify.json
- regarding instructions: simple and straightforward
- regarding p examples: covers bases adequately
- regarding n examples: covers bases adequately
- regarding instructions: The instructions are a little wordy, but it still tells me everything that I need to knw

danyaljj · 2021-11-07T21:56:09Z

Feedback re. tasks 850-1200:

task1087_two_number_sum.json
- regarding instructions: pretty simple and straightforward
task1088_array_of_products.json
- regarding instructions: Absolutely does not make sense in any way until you see the examples and not even then<newline><newline>Too much calculator work - I wouldn't do this HIT unless the pay is stellar and if there are only 2-3 numbers to multiply. The examples below are TOO LONG and I'm only doing them out of spite because I've already wasted enough time typing all of this.
- regarding p examples: answer[i] is eqaul to the product of all the elements of nums except nums[i] where i is the position of the element in the array, otherwise it is undefined at this point<newline><newline>no... you have to treat examples like we're stupid. You have to go through the actual mathematical steps to show how 840, etc., was reached. These are the ONLY EXPLANATIONS WE WILL GET so you HAVE to be comprehensive.
- regarding n examples: see above.
- regarding instructions: math is hard
task1135_xcsr_en_commonsense_mc_classification.json
- regarding n examples: "finishing was whaat"
- regarding p examples: the explanations/output were not in line with common sense answers and need to be improved
task1148_maximum_ascii_value.json
- regarding instructions: there needs to be a chart of what ascii values are
- regarding p examples: how do we know with nothing to go by
- regarding n examples: no way to know with a list of the values
task1150_delete_max_min.json
- regarding instructions: no clue what the max & min elements are
- regarding p examples: it explains/shows what the max & min are
- regarding n examples: it explains the max/min and order well
task1152_bard_analogical_reasoning_causation.json
- regarding instructions: I'm assuming everyone knows what analogies are. The instructions are fine.
- regarding p examples: Is there only one word you're looking for? This seems like an ambiguous task - these are more synonyms than anything. Do we have to write an explanation?
- regarding n examples: Is there only one word you're looking for? Do we have to write an explanation? These are more synonyms than analogies.
- regarding instructions: instructions make sense, no issues
- regarding p examples: examples give good explanation on what to do
- regarding n examples: examples are good at showing what is not expected, complements positive examples
task1153_bard_analogical_reasoning_affordance.json
- regarding instructions: examples make sense to show what is wanted
- regarding n examples: bad examples are obvious on what doesn't work
- regarding instructions: doesn't explain what an affordance is
- regarding p examples: some of the words could have many things, so not sure what all is acceptable
- regarding n examples: could have more and diverse examples
task1154_bard_analogical_reasoning_travel.json
- regarding p examples: Explanation in all the three positive examples should use the word "output" instead of "answer" in the last sentence to make it more clear.
task1155_bard_analogical_reasoning_trash_or_treasure.json
- regarding instructions: a bit convoluted
- regarding p examples: seems like some things could be subjective and that's not covered
- regarding n examples: subjectivity not discussed
task1158_bard_analogical_reasoning_manipulating_items.json
- regarding instructions: not clearly defined some guesswork involved
task1159_bard_analogical_reasoning_containers.json
- regarding instructions: IT WAS GOOD.
- regarding p examples: IT WAS VERY CLEAR.
- regarding n examples: GIVE THE CORRECT WORD ALSO WITH THE EXAMPLE.
task1186_nne_hrngo_classification.json
- regarding instructions: It's missing an ending punctuation mark.
- regarding p examples: Your HTML/Markdown formatting is messed up.<newline><newline>All of these numbers and explanations are totally opinion. Like you can justify any number. <newline><newline>Are we also expected to provide an example?
- regarding n examples: Your HTML/Markdown formatting is messed up.<newline><newline>See above
- regarding p examples: the example do not detail what to do on odd typos, like the "-s" used in input task 1. do we ignore issues like that or make it part of the evaluation
task1189_check_char_in_string.json
- regarding n examples: Explanation for negative example 1 is wrong because the output is correct as I is present in the string.
- regarding instructions: The comma MUST be offset by a space, otherwise the desired character is VERY HARD TO SEE<newline><newline>In fact why can't you put the desired character on a separate line? Like, why are you making this deliberately difficult for workers?
- regarding p examples: n/a
- regarding n examples: n/a
task1190_add_integer_to_list.json
- regarding instructions: explained well, no issues
task1191_food_veg_nonveg.json
- regarding instructions: unless you're well versed in Indian food or Hindi, the instructions are useless
- regarding p examples: pork is obvious, but the other dish is impossible without google unless you're Indian
- regarding n examples: not possible to do without googling dishes or words
- regarding instructions: This requires too much outside research. At the very least you should pre-program a google search with the term of the dish.
- regarding p examples: I mean it's pretty straight forward, although it's not clear whether we need to also provide a written explanation.
- regarding n examples: I mean it's pretty straight forward, although it's not clear whether we need to also provide a written explanation.
task1192_food_flavor_profile.json
- regarding p examples: include the ingredients of the flavor profile
task1195_disflqa_disfluent_to_fluent_conversion.json
- regarding instructions: It is hard but doable to define the objective answer.
task856_conv_ai_2_classification.json
- regarding p examples: Markdown formatting isn't working.
- regarding n examples: Markdown formatting isn't working.
- regarding instructions: instructions clear, makes sense
task862_asdiv_multidiv_question_answering.json
- regarding p examples: Example 1 is kind of weird, how is someone going to use a quarter of a paper bag? Doesn't make sense.
- regarding p examples: explains what is correct well
task863_asdiv_multiop_question_answering.json
- regarding n examples: Example 2 has some grammatical errors in the explanation.
task866_mawps_multidiv_question_answering.json
- regarding instructions: Nothing, it's very clear
- regarding p examples: They're great
- regarding n examples: They're great
task868_mawps_singleop_question_answering.json
- regarding instructions: I don't understand the difference between a good and a bad answer. Some kind of text explanation would be helpful, not just examples. You're just making us guess if you don't spell it out.
- regarding p examples: There should not be a space before punctuation marks.
- regarding n examples: There should not be a space before punctuation marks.<newline>You are missing a "the" before "correct answer".
task892_gap_reverse_coreference_resolution.json
- regarding instructions: Explanations in Example 1 of positive examples and both examples in negative examples are confusing and incorrect.
- regarding p examples: Output in example 1 should be "he" because the pronoun is to replace "Thomas" after a comma.
- regarding n examples: Negative example 1 should be correct as "he" appears in the last sentence, which should refer to "Fritzsch".<newline>Negative example 2 should be correct as "he" referring to Elias appears in the first sentence.
task927_yelp_negative_to_positive_style_transfer.json
- regarding p examples: Could have explained a little more about changes. For example in ex. 2, there were two changes, not just the one.
- regarding n examples: Generally fine but could have another couple of examples to really drive the point home.
- regarding instructions: I think overall it these are solid. I think that it is easy to read.
- regarding p examples: The examples are basically perfect. Shows exactly what they want.
task928_yelp_positive_to_negative_style_transfer.json
- regarding n examples: The negative examples are a little more jumbled but still good.
task933_wiki_auto_style_transfer.json
- regarding instructions: I think instructions about parentheses should be added because some people might just omit these clauses automatically.
- regarding p examples: I think that for Example 1's output, there were a few spacing typos before the commas which should be removed.
- regarding n examples: I think providing an example with a longer sentence would be helpful.
task936_defeasible_nli_snli_classification.json
- regarding instructions: no issues, they detail what is wanted
- regarding p examples: no issues, details what was wanted well
- regarding n examples: they explain what isn't wanted well, no issues with understanding what is negative
- regarding instructions: So... you literally just want us to say 'strengthener' or 'weakener'? That's the entire HIT? I'm confused.
- regarding p examples: The HTML/Markdown formatting appears to be broken here. Missing periods at the ends of sentences.
- regarding n examples: The real HITs are going to be more ambiguous if past work from this requester is any indication. Are you ABSOLUTELY SURE this is the only way we can mess up?<newline><newline>Okay I've just done the examples. This makes NO SENSE. How can eating out of a takeout container either strengthen or weaken the scenario? Couldn't you say either? What does that even mean to weaken/strengthen the situation????
task938_copa_hi_commonsense_reasoning.json
- regarding instructions: The instructions were clear
- regarding p examples: I do not think that these are good examples
- regarding n examples: No, I cannot understand this
task939_copa_hi_commonsense_cause_effect.json
- regarding p examples: There seems to be no relation between the input and the output which renders the examples useless
- regarding n examples: There seems to be no relation between the input and the output which renders the examples useless
task941_copa_gu_commonsense_cause_effect.json
- regarding instructions: I do not know the language that they are asking about but the instructions themselves are pretty clear.
- regarding p examples: No idea what these examples are referring to
- regarding n examples: These are not clear at all.
task942_copa_mr_commonsense_reasoning.json
- regarding instructions: Makes connections between mathematical concepts, between concepts and procedures, or between concepts, procedures, and application. Prompts cognitive effort. Is problem-based, authentic, or interesting.
- regarding p examples: Being happy even when you have little. Having a good time even when you are losing.
- regarding n examples: While employers may not love the idea of having an employee who is preoccupied with the finer points, a candidate who assures quality and strives for balance can be a great asset.
task945_wiki_cloze_bn_multiple_choice_question_answering.json
- regarding instructions: pretty simple and straightforward
- regarding p examples: There seems to be no relation between the input and the output which renders the examples useless
- regarding n examples: There seems to be no relation between the input and the output which renders the examples useless
task946_wiki_cloze_gu_multiple_choice_question_answering.json
- regarding instructions: Makes connections between mathematical concepts, between concepts and procedures, or between concepts, procedures, and application. Prompts cognitive effort. Is problem-based, authentic, or interesting.
- regarding p examples: Being happy even when you have little. Having a good time even when you are losing.
- regarding n examples: Common strengths include independence, persistence, creativity, and ingenuity. Common weaknesses include procrastination, impatience, impulsiveness, and forgetfulness.
task947_wiki_cloze_hi_multiple_choice_question_answering.json
- regarding p examples: the relation between the input and the output is not clearly explained (in fact the output is blank) and thus the explanation is worthless
- regarding n examples: the relation between the input and the output is not clearly explained (in fact the output is blank) and thus the explanation is worthless
- regarding instructions: Makes connections between mathematical concepts, between concepts and procedures, or between concepts, procedures, and application. Prompts cognitive effort. Is problem-based, authentic, or interesting.
- regarding p examples: Being happy even when you have little. Having a good time even when you are losing.
- regarding n examples: While employers may not love the idea of having an employee who is preoccupied with the finer points, a candidate who assures quality and strives for balance can be a great asset.
task948_wiki_cloze_kn_multiple_choice_question_answering.json
- regarding instructions: The Instructions said that I am given a statement written in Kannada, but none of the inputs in the examples is in Kannada. I have no idea what is happening here.
- regarding p examples: None of the inputs in the positive examples is written in Kannada, which is contradictory to the instructions given at the beginning.
- regarding n examples: None of the inputs in the negative examples is written in Kannada, which is contradictory to the instructions given at the beginning.
task951_wiki_cloze_or_multiple_choice_question_answering.json
- regarding p examples: there is no output. therefore the explanation is meaningless and the example is worthless.
- regarding n examples: there is no output. therefore the explanation is meaningless and the example is worthless.
task954_wiki_cloze_te_multiple_choice_question_answering.json
- regarding p examples: there is no output. therefore the explanation is meaningless and the example is worthless.
- regarding n examples: there is no output. therefore the explanation is meaningless and the example is worthless.
task958_e2e_nlg_text_generation_parse.json
- regarding instructions: the formatting for the instructions is a mess. The explanation on the ordering of items needs to be separated from the rest of the paragraph for ease of reading
task959_e2e_nlg_text_generation_identify.json
- regarding instructions: simple and straightforward
- regarding p examples: covers bases adequately
- regarding n examples: covers bases adequately
- regarding instructions: The instructions are a little wordy, but it still tells me everything that I need to knw

danyaljj · 2021-11-09T21:18:38Z

Feedback regarding tasks 1200-1540:

task1200_atomic_classification_xeffect.json
- regarding instructions: There are a lot of ways to interpret the outcomes. The heads/tails reference is off putting too.
task1201_atomic_classification_xintent.json
- regarding instructions: Does not explain what a tuple is.
task1202_atomic_classification_xneed.json
- regarding p examples: Someone could find an opportunity that doesn't involve stocks
task1203_atomic_classification_xreact.json
- regarding instructions: awfully convoluted and hard to follow
- regarding n examples: the second example is confusing as to why X would be happy about projecting Y
task1205_atomic_classification_isafter.json
- regarding instructions: I wouldn't call it "heads" and "tails". That's just confusing because in English, this usually refers to a coin toss, not a sequence of events.
- regarding p examples: Markdown formatting doesn't work.
- regarding n examples: Markdown formatting doesn't work.
- regarding instructions: again, please stop calling the tuples 'heads and tails'.
- regarding p examples: These are ambiguous creative writing exercises. Too subjective.
- regarding n examples: See above.
task1207_atomic_classification_atlocation.json
- regarding p examples: Calculators is spelled incorrectly in the explanation of example 1.
task1208_atomic_classification_xreason.json
- regarding p examples: I'm not sure I would agree that "spending money" is the reason for having a haircut.
- regarding instructions: Frankly the instructions bit confusing.
task1209_atomic_classification_objectuse.json
- regarding instructions: I understood how head and tails should relate to each other.
task1212_atomic_classification_hasproperty.json
- regarding instructions: don't know what a tuple is.
- regarding p examples: examples were pretty simple, and could use explanation of a more complicated one
- regarding instructions: offer examples of ambiguous circumstances
- regarding n examples: a little more explanation about why they are wrong could be helpful
task1213_atomic_classification_desires.json
- regarding instructions: Please... stop calling it Heads/Tails, this is very confusing to native English speakers
- regarding p examples: Confusing because a person could just as easily NOT want X. I guess you're trying to say that we should say yes if it's at all plausible in any situation that a person might want X?
task1215_atomic_classification_capableof.json
- regarding instructions: pretty good
- regarding p examples: okay
- regarding n examples: okay
task1216_atomic_classification_causes.json
- regarding instructions: Does not explain what a "tuple" is. It's apparently a math term so it feels out of place
task1283_hrngo_quality_classification.json
- regarding n examples: No, 1 and 0 are the only valid outputs. Not 6.
- regarding instructions: I may completely be misunderstanding something, but it says quality must be 0 or 1 but then in the Negative examples when decimal places are selected, it says it has to be an integer between 1 and 6.
- regarding p examples: You could always explain why in the examples something is (or isn't) correct to make it even that much clearer.
- regarding n examples: See the part about the rating 0/1 and 1-6 I mentioned earlier.
task1284_hrngo_informativeness_classification.json
- regarding instructions: Should be "provides" not "provide" (subject/verb agreement)
- regarding p examples: Typos<newline>Markdown formatting doesn't work.
- regarding n examples: Typos<newline>Markdown formatting doesn't work.
- regarding instructions: provide --> provides
- regarding p examples: Markdown formatting doesn't work.
- regarding n examples: Markdown formatting doesn't work.
task1285_kpa_keypoint_matching_assisted_suicide_topic.json
- regarding instructions: explain what makes them match exactly
- regarding p examples: give more details about what is related and what actually matches
task1286_kpa_keypoint_matching_homeschooling_topic.json
- regarding instructions: Does not explain what a keypoint is and should probably be changed simply to summary
task1287_kpa_keypoint_matching_marriage_topic.json
- regarding instructions: The instructions in general are extremely confusing. I'm not exactly sure what I'm supposed to be clarifying or fixing or what the outcome of the directions would be if I weren't just fixing the instructions.
- regarding p examples: Beyond just being weak example anyway they're badly written with poor to no punctuation. The explanation of the examples is just a rewording of the example, it doesn't actually explain anything. Also Output: True of False, I don't understand what this means or is for (it's purpose) <newline>The positive show reasons not to be married
- regarding n examples: The negative examples seem to show positive reasons to get married, and the positive shows reasons not to get married, it's really confusing.
- regarding instructions: I wish the directions gave more clarity about how definitive the second part of the sentence was meant to be.
- regarding p examples: examples help here.
task1288_kpa_keypoint_matching_capital_punishment_topic.json
- regarding p examples: i didn't like that it just repeats the the argument in the explanation
- regarding n examples: It just says it does or does not summarize, without giving much detail as to why.
task1289_kpa_keypoint_matching_intellectual_property_rights.json
- regarding p examples: Should have spaces between the [sep] so it's easier to read.
- regarding n examples: Should have spaces between the [sep] so it's easier to read.
- regarding instructions: Keypoint is an unexplained word and should be replaced with summary
task1290_kpa_keypoint_matching_atheism_topic.json
task1292_kpa_keypoint_matching_human_cloning_topic.json
- regarding n examples: Ex 1 seems contradictory in it's reasoning and explanation
- regarding instructions: Just a few more words explaining the examples would help a lot.
- regarding p examples: A little more explanation about what really makes them "positive".
- regarding n examples: They seemed ambiguous and hard to really judge.
task1293_kpa_keypoint_matching_military_companies_topic.json
- regarding p examples: don't really say why
- regarding n examples: doesn't explain the reasoning
task1295_kpa_keypoint_matching_guantanamo_bay_detection_camp_topic.json
- regarding p examples: Seems subjective to some extent.
- regarding n examples: Seems subjective to some extent.
task1296_kpa_keypoint_matching_mandatory_retirement_topic.json
- regarding instructions: typos
- regarding p examples: lack of clarities
- regarding n examples: typos
task1297_kpa_keypoint_matching_nuclear_weapons_topic.json
- regarding instructions: I misunderstood on the previous tasks that I should only answer with true or false, so that should be better clarified.
- regarding instructions: pretty good
- regarding p examples: pretty good
- regarding n examples: okay
task1299_kpa_keypoint_matching_compulsory_voting.json
- regarding p examples: they don't really explain why
- regarding n examples: they don't really explain any reasoning
task1301_kpa_keypoint_matching_prostitution_topic.json
- regarding p examples: reasoning isn't explained<newline>ex 3 assumes all prostitutes are women, which to me makes that false as are were many men and trans who are prostitutes
- regarding n examples: not really explained, just repeated
task1305_kpa_keypoint_matching_journalism_topic.json
- regarding instructions: I don't think keypoint is a real word and should be replaced with 'summary'
task1306_kpa_keypoint_matching_space_exploration_topic.json
- regarding instructions: The subjects are pretty specific and can be a little difficult
task1307_kpa_keypoint_matching_vocational_education_topic.json
- regarding p examples: they don't really explain why
- regarding n examples: they don't really explain their reasoning
task1309_amazonreview_summary_classification.json
- regarding instructions: 'True' if given review <newline>should be "if the given review"
- regarding n examples: it was all pretty clear, I like the example with the wrong input
task1310_amazonreview_rating_classification.json
- regarding instructions: rating evaluation explained well
task1311_amazonreview_rating_classification.json
- regarding instructions: Return true if sentence belongs to that section else 'false' doesn't make sense.
- regarding p examples: The examples explains what needs to be done in the task.
- regarding n examples: The negative examples needs a bit more explaining.
task1315_find_range_array.json
- regarding instructions: So it seems like you take the highest number and subtract the lowest?
task1316_remove_duplicates_string.json
- regarding instructions: define what a string is -> string might be known for computer scientists, but it's not common among average speakers of English
task1318_country_national_dish.json
- regarding instructions: The instructions are very clear. No problems.
- regarding p examples: These examples are clear, but it would have been useful had other possible national dishes been listed to show what would be other acceptable outputs (or add another example with multiple possible outputs).
- regarding n examples: These seem straightforward and clear.
task1319_country_by_barcode_prefix.json
- regarding instructions: There should be an explanation for what is meant by barcode prefix.
- regarding p examples: It should explain how you find the answer.
- regarding n examples: It should explain how you properly find the correct answer.
- regarding p examples: These are good because it reinforces exactly what information you are looking for
- regarding n examples: I feel the bad example pictures are unnecessary. You could simply add a sentence that says something like "incorrect answers will lead to a rejected HIT"
task1320_country_domain_tld.json
- regarding n examples: first example is talking about a different country
task1321_country_continent.json
- regarding p examples: yeah you're expecting people to either have an atlas memorized or to go outside of Mturk and do research.... this is a bad task.
- regarding instructions: A little short but gives me everything that I need to know
- regarding instructions: A map would help
task1322_country_government_type.json
- regarding instructions: The wording could be improved. Use "name" or "list" instead of return.
task1325_qa_zre_question_generation_on_subject_relation.json
- regarding instructions: "Try to use minimal number of words" grammar.<newline><newline>The instructions don't make sense without the examples.
- regarding p examples: Example #1 makes absolutely no sense. Where in the input is there anything about nobility titles?<newline>"reffered" come on, at least run spell check on your own work.<newline>Markdown formatting, as usual, doesn't work.
- regarding n examples: Markdown formatting doesn't work.<newline>Bad grammar.
task1327_qa_zre_answer_generation_from_question.json
- regarding instructions: You should mention the need to generate a correct answer, not just a concise answer.
- regarding p examples: Again, it is important to stress that an output is correct because it is the factually correct answer, not simply because it is a concise answer.
- regarding n examples: These are good at illustrating the need to have a correct answer but the word "correct" should be mentioned.
task1333_check_validity_date_ddmmyyyy.json
- regarding instructions: I could understand the dating format well.
- regarding p examples: They were clear and easy to read.
- regarding n examples: They were easy to follow along with.
task1335_sqac_question_generation.json
- regarding instructions: why are the instructions in English but the task is to do something in Spanish? shouldn't the whole thing be in Spanish?
- regarding p examples: in a language I don't speak
- regarding n examples: in a language I don't speak
- regarding p examples: useless unless you speak Spanish
- regarding n examples: useless unless you understand Spanish
task1340_msr_text_compression_compression.json
- regarding n examples: The negative examples are confusing versus the instructions.
task1341_msr_text_classification.json
- regarding instructions: Instructions should be more in-depth to help differentiate between good and bad quality sentences
task1346_glue_cola_grammatical_correctness_classification.json
- regarding instructions: In the sentence "Check the sentecne is grammatical and meaningful. If the statment is grammatically correct then output '1', and '0' otherwise.", "sentecne" should be changed to "sentence" and "statment" should be changed to "statement".
task1347_glue_sts-b_similarity_classification.json
- regarding p examples: Put sentence #2 underneath sentence #1, not together- Use language such as "correct" or "satisfactory"
- regarding instructions: 2 and 3 seem almost identical classifications
task1354_sent_comp_classification.json
- regarding instructions: It was easy to understand what I should and would be doing for the task.
task1355_sent_comp_summarization.json
- regarding instructions: "a text of article"<newline>you are given the text in an article
- regarding p examples: Way too subjective.
- regarding n examples: Way too subjective.
task1357_xlsum_summary_generation.json
- regarding instructions: could use a bit more explanation on what makes a good summary
- regarding p examples: way too long
- regarding n examples: that's a lot to summarize in sentence
task1358_xlsum_title_generation.json
- regarding instructions: Instructions are easy to understand. Maybe you put in some restrictions like length. Also mention the purpose (newspaper, website etc).
- regarding p examples: The examples are easy to understand. Maybe a bit to perfect.
- regarding n examples: The negative examples are not good as the titles are obviously not fitting. The negative examples should be more 'grey' area like. A title that could fit for other reasons is not the best.
task1359_numer_sense_answer_generation.json
- regarding p examples: the explanations assume a wide base of knowledge to know what number goes in the blank
- regarding n examples: explanations assume people know a lot to fill in the blanks
- regarding instructions: Are we supposed to look up these answers?
task1360_numer_sense_multiple_choice_qa_generation.json
- regarding p examples: This is a fact and requires commonsense to answer.<newline><newline>LMAO. No it isn't. Do you think everyone knows Latin? <newline><newline>The snake answer is also ridiculous. I'm not doing extra research unless it's a very well paying HIT and this answer would depend on the KIND of snake. Not all snakes give birth to live young so this doesn't even make sense.
- regarding n examples: Wow great guess we need to be chemistry majors to answer these.
task1361_movierationales_classification.json
- regarding instructions: barely any instructions with no criteria
- regarding p examples: HUGE amount of reading for examples, with almost no explanation of the answer
- regarding n examples: same as the positives. There isn't enough explanation to determine what is positive or negative.
task1364_hans_answer_generation.json
- regarding instructions: The wording is confusing--there's got to be a better word to describe the goal other than "entailed"--and there's a typo in the last sentence: sentence is misspelled.
- regarding p examples: The positive examples should mention the fact that the same objects/subjects are used in the output sentence, which is also necessary per the instructions.
- regarding n examples: Either Example 2 is wrong or I don't understand it. Also, there should be examples where the same objects/subjects aren't used and that makes it a bad output.
task1368_healthfact_sentence_generation.json
- regarding instructions: Bullet point 4 has the word "atmost" which should read "at most"
task1369_healthfact_sentence_generation.json
- regarding instructions: It says the explanation should be at most 12 sentences, but doesn't say how short the explanation can be.
- regarding p examples: Massive walls of text to read in both + and - examples
task1378_quarel_correct_answer_generation.json
- regarding instructions: It is concise but explain everything needs to be done.
- regarding p examples: Examples are plenty and well explained.
- regarding n examples: The examples explains why it is negative correctly.
task1380_quarel_correct_option_generation.json
- regarding instructions: I could understand how the right choice would be made based on the given information.
- regarding p examples: I was in agreement with the logic behind the positive examples.
- regarding n examples: I understood why the negative examples were invalid based on which information was given in each example.
task1382_quarel_write_correct_answer.json
- regarding instructions: Maybe use a different word than <newline>'ambiguous' to describe something we should be looking out for. I had to look up what that meant. Others might not know what it means either.
- regarding n examples: The question about the sun slowed me down with too much thinking. I know that sounds bad but a more obvious answer would've been clearer for me to understand how it was an incorrect question.
task1383_quarel_write_incorrect_answer.json
- regarding instructions: Overall ok, it would be better to add a sentence to directions stating goal of hit or noting that it should include detail given in question.
- regarding p examples: Pretty good over all, more examples would be better, in light of sparse directions.
- regarding n examples: examples are helpful here.
- regarding instructions: more explanation about what makes it relevant and incorrect
- regarding p examples: maybe offer a couple more
task1394_meta_woz_task_classification.json
- regarding instructions: It's clear but having to remember 47 different domains might be overwhelming
task1398_obqa_question_generation.json
- regarding p examples: Are these really good questions? They might be related, but don't make much sense contextually. Example 1 is just plain wrong.
- regarding instructions: I would say "Construct the question so that" not such that
- regarding n examples: They were way too obvious. A question that is near the subject but not answered by the fact would be much better.
task1399_obqa_answer_generation.json
- regarding p examples: Should not be spaces before punctuation marks
- regarding n examples: Should not be spaces before punctuation marks
- regarding p examples: It was clear until these examples that you could come up with an answer that isn't specifically stated in the fact
task1406_kth_smallest_element.json
- regarding p examples: There are too many numbers and this is a horrible task. I used an online tool to do this but if you do not directly provide a link then this is a huge waste of time for workers.
task1408_dart_similarity_classification.json
- regarding instructions: I could understand why what was ranked as similar and dissimilar were ranked that way.
- regarding p examples: The examples seemed to be in agreement with my own judgement and were easy to understand.
- regarding n examples: They were easy to agree with and understand.
- regarding instructions: directions are brief but clear.
- regarding n examples: Dissimilar
task1409_dart_text_generation.json
- regarding instructions: special tokens useful that can be replaced <newline>... what does "useful" mean here? This makes no grammatical sense.
- regarding p examples: I have no idea what you're supposed to do with the word salad of the input.
- regarding n examples: I could just as easily use the Negative examples as Positive ones... makes no sense.
- regarding instructions: Honestly, I don't think you can trust that most workers are going to know what a predicate is. I find this confusing especially since you don't specify where we need to cut off the phrase.
task1415_youtube_caption_corrections_grammar_correction.json
- regarding instructions: Doesn't explain how it's going to have commas
- regarding p examples: The comma's aren't explained
task1418_bless_semantic_relation_classification.json
- regarding p examples: The outputs show words shortened for ex. mero instead of meronym but this is not explained in the instructions
task1419_mathqa_gain.json
- regarding instructions: Doesn't explain what gain is and is worded oddly. Would probably change it to 'Answer the word problem with the correct option.'
- regarding p examples: Typos, spacing issues, this is WAY too much math, Markdown formatting doesn't render correctly
- regarding n examples: Typos, spacing issues, this is WAY too much math, Markdown formatting doesn't render correctly
task1420_mathqa_general.json
- regarding p examples: In the sentence 'explanation : 2 / 3 = . 66 , 3 / 4 = . 75 , 4 / 5 = . 8 and 5 / 3 = 1.66 so biggest is 5 / 3 and smallest is 2 / 3 their difference is 5 / 3 - 2 / 3 = 3 / 3 = 1 option d', the word "the" should be added before "biggest" and before "smallest".
- regarding instructions: I understood that it was multiple choice and to essentially solve the math problem.
task1422_mathqa_physics.json
- regarding p examples: "I think" isn't really a great explanation
- regarding n examples: assumes pretty advanced maths knowledge
task1423_mathqa_geometry.json
- regarding p examples: There are spacing issues with punctuation. With example 1, I think there should be an explanation as to how it is known the triangle is a right triangle.
- regarding n examples: Example 2 especially should be formatted better. It looks like a long confusing run-on sentence.
task1425_country_iso_numeric.json
- regarding instructions: explains the code system well
- regarding p examples: Any hit that requires you to go outside the MTurk website (wasting time) needs to be very well compensated. The average person is not going to know this off the top of their heads.
task1426_country_independence_year.json
- regarding instructions: You could define the term "independence". You could replace "the" with "that" in the " in which the country became independent." part.
- regarding p examples: "1975 is the year of independence of the country called Angola." Could be simplified to "1975 is the year of independence of Angola."
- regarding n examples: "1919 is not the year of independence of the country called Bhutan. However, 1910 is year of independence of the country called Bhutan. " could simplified to "1919 was not the year of independence of the country called Bhutan.1910 was."
task1427_country_region_in_world.json
- regarding instructions: All possible regions should be stated, instead of some of them. Central Africa is there twice.
task1428_country_surface_area.json
- regarding instructions: If all questions are only about the area covered by a country then it could be improved by mentioning this at the start.
- regarding n examples: I think these are easy to understand.
task1429_evalution_semantic_relation_classification.json
- regarding instructions: Extremely convoluted, multi-step instructions with too many explanations
task1431_head_qa_answer_generation.json
- regarding instructions: The instruction is very vague.
- regarding p examples: The examples explains the positive pretty well.
- regarding n examples: The examples explains why they are negative.
task1434_head_qa_classification.json
- regarding p examples: Ex. 1 says stomach bloating is related to nursery, which isn't even an option, and not medicine or biology, so that's pretty confusing..others seem reasonable
- regarding instructions: you should say "Each instance contains a question and options for that question." it doesn't have the "and"
- regarding n examples: I don't really understand why the "negative" examples are bad
task1438_doqa_cooking_answer_generation.json
- regarding p examples: Example 1's output did not actually answer the follow-up Q
task1439_doqa_cooking_isanswerable.json
- regarding p examples: Text is too long, at least break it up with paragraph breaks to make it easier to read
- regarding n examples: text is too long
task1442_doqa_movies_isanswerable.json
- regarding instructions: There should ideally be clarification as to the correct choice if the question can be only partially answered.
- regarding n examples: seemed fine
task1444_round_power_of_two.json
- regarding instructions: I think the bulletpoint examples in the first paragraph should be listed in a row instead of included in the paragraph.
- regarding p examples: I think the numbers make it a little confusing at least for me
- regarding n examples: It's well formatted but I think the background color should be yellow so it stands out more
task1448_disease_entity_extraction_ncbi_dataset.json
- regarding instructions: Ok, it's fairly good. A table of diseases might be helpful if this sort of work is exhaustive and there are rare or odd disorders mentioned though.
- regarding n examples: Doesnt give good examples
task1452_location_entity_extraction_btc_corpus.json
- regarding instructions: Explains pretty good what needs to be done.
- regarding p examples: It gives a good idea what are the positive entries are.
- regarding n examples: The negative explanation is a bit confusing.
task1453_person_entity_extraction_btc_corpus.json
- regarding p examples: should twitter names be considered people names?
- regarding instructions: The word tokens is confusing
task1479_organization_entity_extraction_btc_corpus.json
- regarding n examples: I scrolled up to remind myself to put not found in the first example. I think that should be mentioned again
task1480_gene_extraction_jnlpba_dataset.json
- regarding instructions: assumes people know the names of all the genes/proteins which isn't common knowledge
- regarding p examples: assumes people know the names of all the genes/proteins which isn't common knowledge
- regarding n examples: assumes people know the names of all the genes/proteins which isn't common knowledge
- regarding instructions: Does not explain what the JNLPBA Corpus is. Does not state how to separate tokens, whether by commas or line breaks, etc
- regarding instructions: The medical terms are a little hard to understand.
- regarding p examples: Should show an example when there's multiple proteins.
task1481_gene_extraction_bc2gm_dataset.json
- regarding p examples: How should I know which of these names are proteins and which aren't? Do you expect people to do extra work to look them up?
task1482_gene_extraction_chemprot_dataset.json
- regarding instructions: On one negative example, it does not tell you the correct answer.
- regarding p examples: there needs to be more examples
- regarding n examples: There needs to be more examples.
task1483_chemical_extraction_chemprot_dataset.json
- regarding instructions: It's hard to know what's a chemical or other medical term.
task1484_gene_extraction_linnaeus_dataset.json
- regarding instructions: Does not explain what the linnaeus Corpus is. Does not explain how to list multiple tokens, whether by commas, spaces, line breaks etc
task1485_organ_extraction_anem_dataset.json
- regarding instructions: ' in which ' = use 'where' instead
- regarding p examples: An ear isn't an organ, though....
task1487_organism_substance_extraction_anem_dataset.json
- regarding instructions: needs to better explain what the target words are
- regarding p examples: poor grammar, assumes a level a biology familiarity some may not have<newline>serum or semen? not clear
- regarding n examples: again, assumes a certain level of knowledge that isn't common
task1488_sarcasmdetection_headline_classification.json
- regarding p examples: You do realize this is a subjective task?
task1489_sarcasmdetection_tweet_classification.json
- regarding n examples: In the sentence "This should be classified as non-sarcastic because nothing has been specified in such a wat that it means to say exactly the opposite.", "wat" should be changed to "way".
- regarding instructions: I could see why one thing would be considered sarcastic or not, no changes need to be made.
task1495_adverse_drug_event_classification.json
- regarding instructions: It didn't explain what is suppose to be written.
- regarding p examples: It didn't show what's suppose to be wrote if their is no drug interaction in the sentence
- regarding n examples: It just compounded the issue of what are you suppose to write if their is nothing about drug interactions.
- regarding instructions: Simplified/laymens terms would be helpful for general instructions.
- regarding instructions: What if there is no drug reaction
task1498_24hour_to_12hour_clock.json
- regarding p examples: The example is fine but the wording used in the explanation is confusing. Turning this into some kind of math equation with brackets and whatnot makes it difficult to follow. A more simple explanation, like what is used in the negative examples, is much better and all that is needed here.
- regarding p examples: It's a little wordy, don't you think? I'm not sure it's even correct, it's pretty hard to understand.
task1500_dstc3_classification.json
- regarding instructions: Can probably get rid of the following sentence since it doesn't help in answering questions: "In the dialogue, the user may provide some criteria for the type of place they want such as price range, cuisine, etc. Similarly, the user may ask details of the place suggested by the system such as phone number, address, etc."
task1501_dstc3_answer_generation.json
- regarding instructions: The instructions are very thorough and good. My only complaint is a typo in the penultimate sentence which says, "The answer of..." That should read "answer to..."
- regarding p examples: These examples are understandable and have an adequate amount of explanation.
- regarding n examples: I think example 1 should explicitly mention the line of dialogue (by quoting it) that makes this a bad example (i.e. wrong).
task1502_hatexplain_classification.json
- regarding n examples: Understanding what is or isn't hate speech vs just offensive is a little bit hard, but I mostly get it.
task1503_hatexplain_classification.json
- regarding instructions: should be an article in the first sentence "a hate speech/offensive/etc. tweet"
task1505_root09_semantic_relation_classification.json
- regarding p examples: HYPER is a terrible one.
- regarding instructions: This is absolutely not going to make sense without examples.<newline>Also don't understand the category names. They don't make sense.
- regarding p examples: VERY ambiguous and confusing.<newline>Input: X: bowl, Y: game -- what if I thought 'bowl' was a noun (like a bowl to put liquid in)?<newline>Input: X: whale, Y: salmon -- I could easily say there is NO relation because a whale is a mammal and a salmon is a fish. This is just too ambiguous to understand what you want.
- regarding n examples: Input: X: bomber, Y: truck -- ok, what if I think 'a bomber could use a truck to bomb people'? Again this task is much too ambiguous.
task1506_celebrity_minimal_dob_span.json
- regarding instructions: Seems pretty straight forward.
task1507_boolean_temporal_reasoning.json
- regarding instructions: The instructions are difficult to follow. I needed examples to figure it out
task1508_wordnet_antonyms.json
- regarding instructions: Thorough explanation of what you want
- regarding p examples: Add more examples
- regarding n examples: Use more common words and examples
task1509_evalution_antonyms.json
- regarding instructions: It's only one word.
- regarding p examples: It doesn't tell me much.
- regarding n examples: Nothing aside from one word.
task1516_imppres_naturallanguageinference.json
- regarding instructions: I would personally change negated to negative to make it easier to understand
task1519_qa_srl_question_generation.json
- regarding instructions: There are a lot of ways to word the questions.
task1535_daily_dialog_uniqueness_classification.json
- regarding instructions: Vague as to how one should classify emotions.
- regarding p examples: Example 1 clearly shows only 2 emotions but is still given the output 1

danyaljj · 2021-11-10T22:02:01Z

This is the last one! Feedback regarding tasks 1540-1740:

task1540_parsed_pdfs_summarization.json
- regarding instructions: Need to know how short/long the headline should be
task1542_every_ith_element_from_starting.json
- regarding instructions: Would either explain what an array is or simplify it to string of numbers
- regarding instructions: Explain the concept of array at the beginning.
- regarding p examples: The overall information is good. May add some position to point of easy understanding.
- regarding n examples: The overall information is good. May add some position to point of easy understanding.
- regarding instructions: It wasn't that clear if we were finding every ith element including the first, or counting from the first - for example with 4, should it be 1, 4, 8, 12, or should it be 1, 5, 9, 13?
task1549_wiqa_answer_generation_missing_step.json
- regarding p examples: Ex 2: dividing deviding
- regarding n examples: Ex 1 (3) The rock should be The water
task1553_cnn_dailymail_summarization.json
- regarding instructions: Instructions make sense, though the 10 lines part is a bit confusing. Does it mean a line straight across, left to right? Cause if so then resolution of the computer will change that amount. And if it means sentences then it should say sentences.
- regarding p examples: The examples are way too long, could be much shorter. Also makes it confusing given that you ask for 10 lines but there's neither 10 lines or sentences in the summary.
- regarding n examples: Same for the negatives.
task1554_scitail_classification.json
- regarding instructions: "The data consists of 6494 entries with equal amount of entails and neutral classes." -don't need to know this information to answer questions
- regarding instructions: "The data consists of 6494 entries with equal amount of entails and neutral classes." - I don't need to know this information for this task
task1556_scitail_passage_generation.json
- regarding instructions: easy to understand
- regarding p examples: was good
- regarding n examples: i liked it
task1557_jfleg_answer_generation.json
- regarding instructions: Doesn't say what to do if the sentence appears correct.
task1558_jfleg_incorrect_answer_generation.json
- regarding p examples: ex 1 has poor grammar<newline> <newline>ex 3 continious is spelled wrong
task1559_blimp_binary_classification.json
- regarding instructions: Possibly need to explain what an adjunct island is; I had to google it.
task1560_blimp_binary_classification.json
- regarding p examples: I think the positive examples can be improved by coloring the targeted words from the input that you're comparing against each other, so it's immediately obvious to people what two words should be compared. Either in bold, italic, or underlined.
- regarding n examples: Like the positive examples, I think that the negative examples can be improved by coloring the targeted words that you're comparing against each other. For instance, marking "Bob" and "themselves" in bold or italic black, so people immediately know what two words should be compared.
task1565_triviaqa_classification.json
- regarding instructions: It explains pretty well on how to answer the task correctly
- regarding p examples: Explains how to choose positive answer pretty well.
- regarding n examples: It is a little confusing on how to choose the wrong answer.
- regarding instructions: It doesn't really say what to do if the link is wrong or doesn't have the info.
task1566_propara_structured_text_generation.json
- regarding instructions: If possible define more clearly "entities", many workers are going to assume that entities are living. Examples though, belie this. Do you require the entities (if more than one) to be in specific order? It's intimated SORT OF, but not sure. How many should there normally be?
- regarding p examples: (a tad too long in some examples).
- regarding n examples: some of the examples are overlong, the actual part of the work is a fraction in length, meaning people might just skip reading them at all - it's too onerous, unless pay is quite high (why would examples though, NEED to be overlong and confusing - particularly a negative one?
task1568_propara_classification.json
- regarding p examples: it just says event is present,not why it's right
- regarding instructions: kind of convoluted
- regarding p examples: none of the explanations really say why
task1578_gigaword_summarization.json
- regarding instructions: Need to know how short the summary can be.
- regarding instructions: Need to know how short summary can be.
task1579_gigaword_incorrect_summarization.json
- regarding instructions: How short can the incorrect summary be?
- regarding n examples: ex 2 adds context not in the paragraph
- regarding instructions: Need to know just how short a summary I can provide
task1584_evalution_meronym_classification.json
- regarding p examples: Positive examples were fine, but can be ambiguous. It said books have pictures and you put 1, indicating yes. But not all books have pictures. So it should be rephrased that things "can" have those parts and not that they "always" have those parts.
- regarding n examples: Both of your example answers explanations were wrong, they said different answers than were actually given
- regarding n examples: I found the explanation of the examples to be confusing because the number in the explanation doesn't correspond with the answer given.
task1585_root09_hypernym_generation.json
- regarding instructions: That's a lot of $10 words for something as simple as broad/narrow in a category.
task1587_scifact_classification.json
- regarding p examples: that's a lot to unpack if you don't have a biology degree
- regarding n examples: ex1 "title talks about myelodysplasia." not really<newline>also pretty complex w/o biology background
- regarding instructions: Concise but explain the task pretty good.
- regarding p examples: It gives a good idea on what positive examples are
- regarding n examples: It gives a good idea on what negative examples are
task1588_tecla_classification.json
- regarding p examples: IDK how a comic fair is related to government, other than both might make you laugh these days...<newline>also not English<newline>lable is wrong
- regarding n examples: ex 1 says senators aren't related to government<newline>not English
- regarding p examples: "The text talks about Barcelona Comic Fair which best matches with the assigned lable Government"<newline>I don't think that's accurate.<newline>Label misspelled in both examples<newline>examples not in English<newline>
- regarding n examples: "The text talks about suspension of former senator which does not match with the assigned label Government"<newline><newline>I think it's a good match, although I guess you could make the case for politics?<newline>not in English<newline>
task1592_yahoo_answers_topics_classfication.json
- regarding instructions: Need to specify that the output should be a number and not the topic text
task1593_yahoo_answers_topics_classification.json
- regarding instructions: Should state that the output should be a number, and not a topic word
task1594_yahoo_answers_topics_question_generation.json
- regarding instructions: I was given a single word task which I am not sure how I could have answer it. The instruction only refers to passage. So maybe it could have a part where if there is no passage, this is how you should answer it and etc.
task1595_event2mind_text_generation_1.json
- regarding instructions: a little more context on how long the answer should be and how descriptive it should be
- regarding instructions: Concise instruction but explains the task very well. It should be person not persons.
- regarding p examples: It explains what positive examples are.
- regarding n examples: I would like it if there are more negative examples to make the task clearer.
- regarding n examples: It would be more helpful after showing a bad example if you told what they should have answered.
task1596_event2mind_text_generation_2.json
- regarding instructions: Work quality is the value of work delivered by an individual, team or organization.
- regarding p examples: Time management is crucial to your business's success.
- regarding n examples: Ask clarifying questions.
task1597_nyc_slot_filling.json
- regarding instructions: I read the instructions and it didn't make sense so I figured I'd read the examples and that might give me a better idea so I can come back to the instructions. The instructions were even more confusing that it just left me scratching my head and rereading to try and understand and I still can't.
- regarding p examples: I don't know what these examples are talking about, having words like xnear and xlocation and what not didn't make sense because there's nothing explaining what those mean
- regarding n examples: Basically the same as the positive examples but kind of worse because I don't understand what makes a good example so understanding the bad examples is even harder.
- regarding instructions: It's a tad bit unclear, mainly because it is unusual. When setting up the hit or job please have a sort of grid that categorizes things instead of having to copy the categories. This will clarify and streamline things, probably both for the worker AND for the researcher, making things less prone to error (misspelling, etc).
- regarding instructions: Needs to say that each output category should be listed between brackets, as per examples
task1598_nyc_long_text_generation.json
- regarding instructions: Should say that each input will include name, recommend, cuisine, qual, price, etc
task1601_webquestions_answer_generation.json
- regarding instructions: freebase.com returns a 404 error
- regarding p examples: freebase.com returns a 404 error
- regarding n examples: freebase.com returns a 404 error
- regarding instructions: The problem here wasn't with the directions, it was with the links, all of them produce 404 errors. In fact, www.freebase.com produces a 404 error, so it's impossible to generate accurate answers based on them.
- regarding p examples: none of the links work
- regarding n examples: none of the links work
task1602_webquestion_question_genreation.json
- regarding instructions: Apparently freebase.com got shut down years ago, so none of these urls work, and its impossible to generate questions based on them without guessing.
- regarding p examples: none of the links work
- regarding n examples: none of the links work
- regarding instructions: Although it's mostly answered by the instructions I was initially unsure if I was supposed to click the url
- regarding p examples: Example 1 is only answerable if you are familiar with harry potter
task1604_ethos_text_classification.json
- regarding n examples: ex 2 has poor grammar in the input and explanation
task1605_ethos_text_classification.json
- regarding instructions: could explain what is considered violent language more
- regarding p examples: ex 3 doesn't seem to have any violent words or actions suggested
task1607_ethos_text_classification.json
- regarding instructions: doesn't specific if religious hate includes religion hating others ie 'god hates homosexuals'
task1608_xquad_en_answer_generation.json
- regarding instructions: Should add whether one should simply copy/paste the answer, rephrase it into a sentence, etc
task1609_xquad_en_question_generation.json
- regarding instructions: The instruction is too vague.
- regarding p examples: It gives pretty good idea on what the positive answers are.
- regarding n examples: It explains well what are the negative examples are.
task1614_sick_text_modify.json
- regarding n examples: potatoe misspelled
- regarding n examples: the word "potatoe" in negative example 2 input is misspelled and should be "potato".
task1622_disfl_qa_text_modication.json
- regarding p examples: Positive example 1 is not explained clearly enough.
- regarding n examples: Negative example 2 is hard to follow. The structure of the input statement is so random.
task1625_disfl_qa_asnwer_generation.json
- regarding instructions: Confusing only because I don't understand why I'd read the disfluent question, when i'd much rather read the proper question and answer it
task1626_copa_hr_question_answering.json
- regarding p examples: the word "closed" in explanation in positive example 3 should be "close".
- regarding n examples: "but her" in explanation in negative example 1 should be"but here".
- regarding instructions: they could include what to do when the sentences are in a language that you don't speak, or both answers might be correct
- regarding n examples: Nothing in the instructions say anything about both being correct or what to do. Also, both don't seem to be correct in the examples, so not sure what the point of them are.
task1628_copa_hr_question_answering.json
- regarding instructions: very confusing and needs to be simplified, especially because it more or less boils down to "choose the correct answer based on cause or effect given"
task1630_openpi_classification.json
- regarding instructions: Change print to type or paste
task1631_openpi_answer_generation.json
- regarding instructions: Per examples, the attribute is listed last so should be mentioned last in the instructions.
- regarding instructions: The instructions list the attribute first but in the actual tasks the attribute is listed last
- regarding p examples: formatting errors with the [br] tags in both + and - examples. not sure if I should even mention these errors but there ya go
- regarding instructions: The instructions aren't very clear. For example, if there were no examples I would not be able to successfully complete the task using just the instructions.
task1640_aqa1.0_answerable_unanswerable_question_classification.json
- regarding p examples: ex 1 explanation is bordering on nonsense
task1657_gooaq_question_generation.json
- regarding instructions: The instructions seem a little vague, and it would be nice if there was a little more information.
- regarding n examples: It would help if the explanations weren't as long because they get a bit confusing and hard to follow.
task1658_billsum_summarization.json
- regarding instructions: The instruction is pretty short and didn't explain the task well. Make it a little clear what how long should the summary be.
- regarding p examples: It provide a good example of what a positive summary suppose to be.
- regarding n examples: It is confusing because if the explanation says the summary is correct then why is it negative.
task1659_title_generation.json
- regarding instructions: Need to know how short/long the title should be
task1660_super_glue_question_generation.json
- regarding p examples: ex 1 output has grammar problems
- regarding n examples: ex 2 output has grammar problems
- regarding instructions: random [br] tag in the instructions
- regarding p examples: ex 1 grammar
- regarding n examples: ex 2 grammar
task1661_super_glue_classification.json
- regarding instructions: Spelling error in second sentence "Ee ask you" should be we ask you. Should probably give instructions on what to enter when the question is not answered by the text
task1664_winobias_text_generation.json
- regarding instructions: It could explain to do one set instead of all sets in a sentence.<newline>Also the format the answer should come in, such as "separated by a comma" or "in a list"
- regarding instructions: "strictly present" should be "strictly present tense" I think. Should include that coreference words should be separated by a comma
task1670_md_gender_bias_text_modification.json
- regarding instructions: Should include instructions on what to do if the input already has a female pronoun. Leave it blank? Copy/paste the sentence as is?
- regarding instructions: What if the sentence is already in female-gendered pronoun as in the task below. Also, If I change "I" to female-gender pronoun like "she" it would cause grammatical issues. Therefore, for this example, do I fix the grammar or leave it alone?
task1678_mathqa_answer_selection.json
- regarding n examples: Tardiness.
- regarding instructions: It could be worded a little more concise, there's some filler words that don't need to be there but the point gets across.
task1705_ljspeech_classification.json
- regarding instructions: The instructions are clear and well-written. "Atleast" should be "at least"
- regarding p examples: I think the positive examples are strong and do not need any improvement. No typos.
- regarding n examples: The negative examples were also clear and useful. No typos.
task1711_poki_text_generation.json
- regarding instructions: "Long string paragraph" is sort of vague and the examples all show run on sentences
- regarding instructions: "long string paragraph" should be replaced with "run on sentence" for clarity
task1713_convai3_sentence_generation.json
- regarding instructions: Need to know how short/long the sentence should be.
task1714_convai3_sentence_generation.json
- regarding instructions: it doesn't really fill in all the steps
task1720_civil_comments_toxicity_classification.json
- regarding instructions: could explain what constitutes "toxic" here
- regarding n examples: ex 1 says using a person's name is toxic? I don't understand that at all<newline>ex 2 has poor grammar<newline>The word 'damn' negative tone
task1723_civil_comments_sexuallyexplicit_classification.json
- regarding n examples: abbriviation<newline><newline>says the terms are explicit, but then says it a negative example which contradicts the directions

- first commit for evaluation: adding the first version a working ev…

470cede

…aluation template.

danyaljj marked this pull request as draft September 17, 2021 19:50

- added a script for result aggregation.

a6c0b65

Daniel Khashabi added 8 commits September 27, 2021 14:03

Merge branch 'master' of github.com:allenai/natural-instructions-expa…

7478463

…nsion

- update the preparation script.

d9ec8aa

Merge remote-tracking branch 'origin/master'

40bad03

- update.

3982ae9

- update.

628ded0

- update.

11dd5b2

Merge branch 'master' of github.com:allenai/natural-instructions-expa…

671bd98

…nsion

- update.

e93b625

Daniel Khashabi added 3 commits October 13, 2021 14:09

- add the evaluation script.

060605a

- drop a mistakenly-added directory.

477bdb3

- add more description to the instructions.

86425f5

danyaljj mentioned this pull request Oct 21, 2021

Second PR to Address human feedback 34, 35, 44, 45, 48, 49, 52-58, 167, 201-205 #464

Merged

-

8486c9e

Daniel Khashabi added 5 commits October 28, 2021 14:43

-

673ba05

add quals.

0e29907

fix example display.

87d1287

fix example display.

4b2b2ba

fix example display.

bc1094f

Daniel Khashabi added 2 commits November 6, 2021 18:32

fix example display.

ed0351f

Merge branch 'master' of github.com:allenai/natural-instructions-expa…

72ab9f5

…nsion � Conflicts: � .gitignore

Daniel Khashabi added 3 commits November 7, 2021 13:58

fix example display.

9f40238

Merge remote-tracking branch 'origin/master'

7157650

fix example display.

00fbf2b

fix example display.

8961df0

danyaljj marked this pull request as ready for review January 26, 2022 03:54

update.

b5f5b92

danyaljj merged commit ff125a6 into master Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crowdworker evaluation of the tasks [Work in Progress] #276

Crowdworker evaluation of the tasks [Work in Progress] #276

danyaljj commented Sep 17, 2021

danyaljj commented Sep 18, 2021

danyaljj commented Sep 18, 2021

eshaanpathak commented Sep 19, 2021 •

edited

Loading

danyaljj commented Sep 19, 2021

eshaanpathak commented Sep 19, 2021 •

edited

Loading

danyaljj commented Oct 13, 2021 •

edited by Palipoor

Loading

Palipoor commented Oct 14, 2021

danyaljj commented Oct 14, 2021

danyaljj commented Oct 14, 2021 •

edited

Loading

Palipoor commented Oct 19, 2021 •

edited

Loading

swarooprm commented Oct 19, 2021

danyaljj commented Oct 19, 2021

danyaljj commented Oct 19, 2021 •

edited by Palipoor

Loading

danyaljj commented Oct 20, 2021 •

edited by Palipoor

Loading

danyaljj commented Oct 28, 2021 •

edited by Palipoor

Loading

Palipoor commented Oct 28, 2021

yeganehkordi commented Oct 28, 2021

danyaljj commented Nov 2, 2021

Palipoor commented Nov 4, 2021

danyaljj commented Nov 7, 2021

danyaljj commented Nov 7, 2021

danyaljj commented Nov 9, 2021

danyaljj commented Nov 10, 2021

Crowdworker evaluation of the tasks [Work in Progress] #276

Crowdworker evaluation of the tasks [Work in Progress] #276

Conversation

danyaljj commented Sep 17, 2021

danyaljj commented Sep 18, 2021

When filled in with the content, here is how it would look like:

danyaljj commented Sep 18, 2021

eshaanpathak commented Sep 19, 2021 • edited Loading

danyaljj commented Sep 19, 2021

eshaanpathak commented Sep 19, 2021 • edited Loading

danyaljj commented Oct 13, 2021 • edited by Palipoor Loading

Palipoor commented Oct 14, 2021

danyaljj commented Oct 14, 2021

danyaljj commented Oct 14, 2021 • edited Loading

Palipoor commented Oct 19, 2021 • edited Loading

swarooprm commented Oct 19, 2021

danyaljj commented Oct 19, 2021

danyaljj commented Oct 19, 2021 • edited by Palipoor Loading

danyaljj commented Oct 20, 2021 • edited by Palipoor Loading

danyaljj commented Oct 28, 2021 • edited by Palipoor Loading

Palipoor commented Oct 28, 2021

yeganehkordi commented Oct 28, 2021

danyaljj commented Nov 2, 2021

Palipoor commented Nov 4, 2021

danyaljj commented Nov 7, 2021

danyaljj commented Nov 7, 2021

danyaljj commented Nov 9, 2021

danyaljj commented Nov 10, 2021

eshaanpathak commented Sep 19, 2021 •

edited

Loading

eshaanpathak commented Sep 19, 2021 •

edited

Loading

danyaljj commented Oct 13, 2021 •

edited by Palipoor

Loading

danyaljj commented Oct 14, 2021 •

edited

Loading

Palipoor commented Oct 19, 2021 •

edited

Loading

danyaljj commented Oct 19, 2021 •

edited by Palipoor

Loading

danyaljj commented Oct 20, 2021 •

edited by Palipoor

Loading

danyaljj commented Oct 28, 2021 •

edited by Palipoor

Loading