() indicates the task name in the lmms_eval. The task name is also used to specify the dataset in the configuration file. The following is manually updated documentation. You could use
lmms_eval task --list
to list all supported tasks and their task names.
- AI2D (ai2d)
- ChartQA (chartqa)
- COCO Caption (coco_cap)
- COCO 2014 Caption (coco2014_cap)
- COCO 2014 Caption Validation (coco2014_cap_val)
- COCO 2014 Caption Test (coco2014_cap_test)
- COCO 2017 Caption (coco2017_cap)
- COCO 2017 Caption MiniVal (coco2017_cap_val)
- COCO 2017 Caption MiniTest (coco2017_cap_test)
- COCO 2014 Caption (coco2014_cap)
- ConBench (conbench)
- DetailCaps-4870 (detailcaps)
- DOCVQA (docvqa)
- DOCVQA Validation (docvqa_val)
- DOCVQA Test (docvqa_test)
- Ferret (ferret)
- Flickr30K (flickr30k)
- Flickr30K Test (flickr30k_test)
- GQA (gqa)
- GQA-ru (gqa_ru)
- II-Bench (ii_bench)
- IllusionVQA (illusionvqa)
- Infographic VQA (infovqa)
- Infographic VQA Validation (infovqa_val)
- Infographic VQA Test (infovqa_test)
- LiveBench (live_bench)
- LiveBench 06/2024 (live_bench_2406)
- LiveBench 07/2024 (live_bench_2407)
- LLaVA-Bench-Wilder (llava_wilder_small)
- LLaVA-Bench-COCO (llava_bench_coco)
- LLaVA-Bench (llava_in_the_wild)
- MathVerse (mathverse)
- MathVerse Text Dominant (mathverse_testmini_text_dominant)
- MathVerse Text Only (mathverse_testmini_text_only)
- MathVerse Text Lite (mathverse_testmini_text_lite)
- MathVerse Vision Dominant (mathverse_testmini_vision_dominant)
- MathVerse Vision Intensive (mathverse_testmini_vision_intensive)
- MathVerse Vision Only (mathverse_testmini_vision_only)
- MathVista (mathvista)
- MathVista Validation (mathvista_testmini)
- MathVista Test (mathvista_test)
- MMBench (mmbench)
- MMBench English (mmbench_en)
- MMBench English Dev (mmbench_en_dev)
- MMBench English Test (mmbench_en_test)
- MMBench Chinese (mmbench_cn)
- MMBench Chinese Dev (mmbench_cn_dev)
- MMBench Chinese Test (mmbench_cn_test)
- MMBench English (mmbench_en)
- MME (mme)
- MME-RealWorld (mmerealworld)
- MME-RealWorld English (mmerealworld)
- MME-RealWorld Mini (mmerealworld_lite)
- MME-RealWorld Chinese (mmerealworld_cn)
- MMStar (mmstar)
- MMUPD (mmupd)
- MMUPD Base (mmupd_base)
- MMAAD Base (mmaad_base)
- MMIASD Base (mmiasd_base)
- MMIVQD Base (mmivqd_base)
- MMUPD Option (mmupd_option)
- MMAAD Option (mmaad_option)
- MMIASD Option (mmiasd_option)
- MMIVQD Option (mmivqd_option)
- MMUPD Instruction (mmupd_instruction)
- MMAAD Instruction (mmaad_instruction)
- MMIASD Instruction (mmiasd_instruction)
- MMIVQD Instruction (mmivqd_instruction)
- MMUPD Base (mmupd_base)
- MMVet (mmvet)
- Multilingual LlaVa Bench
- llava_in_the_wild_arabic
- llava_in_the_wild_bengali
- llava_in_the_wild_chinese
- llava_in_the_wild_french
- llava_in_the_wild_hindi
- llava_in_the_wild_japanese
- llava_in_the_wild_russian
- llava_in_the_wild_spanish
- llava_in_the_wild_urdu
- NaturalBench
- NoCaps (nocaps)
- NoCaps Validation (nocaps_val)
- NoCaps Test (nocaps_test)
- OCRBench (ocrbench)
- OKVQA (ok_vqa)
- OKVQA Validation 2014 (ok_vqa_val2014)
- POPE (pope)
- RefCOCO (refcoco)
- refcoco_seg_test
- refcoco_seg_val
- refcoco_seg_testA
- refcoco_seg_testB
- refcoco_bbox_test
- refcoco_bbox_val
- refcoco_bbox_testA
- refcoco_bbox_testB
- RefCOCO+ (refcoco+)
- refcoco+_seg
- refcoco+_seg_val
- refcoco+_seg_testA
- refcoco+_seg_testB
- refcoco+_bbox
- refcoco+_bbox_val
- refcoco+_bbox_testA
- refcoco+_bbox_testB
- refcoco+_seg
- RefCOCOg (refcocog)
- refcocog_seg_test
- refcocog_seg_val
- refcocog_bbox_test
- refcocog_bbox_val
- ScienceQA (scienceqa_full)
- ScienceQA Full (scienceqa)
- ScienceQA IMG (scienceqa_img)
- ScreenSpot (screenspot)
- ScreenSpot REC / Grounding (screenspot_rec)
- ScreenSpot REG / Instruction Generation (screenspot_reg)
- ST-VQA (stvqa)
- synthdog (synthdog)
- synthdog English (synthdog_en)
- synthdog Chinese (synthdog_zh)
- TextCaps (textcaps)
- TextCaps Validation (textcaps_val)
- TextCaps Test (textcaps_test)
- TextVQA (textvqa)
- TextVQA Validation (textvqa_val)
- TextVQA Test (textvqa_test)
- VCR-Wiki
- VCR-Wiki English
- VCR-Wiki English easy 100 (vcr_wiki_en_easy_100)
- VCR-Wiki English easy 500 (vcr_wiki_en_easy_500)
- VCR-Wiki English easy (vcr_wiki_en_easy)
- VCR-Wiki English hard 100 (vcr_wiki_en_hard_100)
- VCR-Wiki English hard 500 (vcr_wiki_en_hard_500)
- VCR-Wiki English hard (vcr_wiki_en_hard)
- VCR-Wiki Chinese
- VCR-Wiki Chinese easy 100 (vcr_wiki_zh_easy_100)
- VCR-Wiki Chinese easy 500 (vcr_wiki_zh_easy_500)
- VCR-Wiki Chinese easy (vcr_wiki_zh_easy)
- VCR-Wiki Chinese hard 100 (vcr_wiki_zh_hard_100)
- VCR-Wiki Chinese hard 500 (vcr_wiki_zh_hard_500)
- VCR-Wiki Chinese hard (vcr_wiki_zh_hard)
- VCR-Wiki English
- VibeEval (vibe_eval)
- VizWizVQA (vizwiz_vqa)
- VizWizVQA Validation (vizwiz_vqa_val)
- VizWizVQA Test (vizwiz_vqa_test)
- VL-RewardBench (vl_rewardbench)
- VQAv2 (vqav2)
- VQAv2 Validation (vqav2_val)
- VQAv2 Test (vqav2_test)
- WebSRC (websrc)
- WebSRC Validation (websrc_val)
- WebSRC Test (websrc_test)
- WildVision-Bench (wildvision)
- WildVision 0617(wildvision_0617)
- WildVision 0630 (wildvision_0630)
- SeedBench 2 Plus (seedbench_2_plus)
- CMMMU (cmmmu)
- CMMMU Validation (cmmmu_val)
- CMMMU Test (cmmmu_test)
- HallusionBench (hallusion_bench_image)
- ICON-QA (iconqa)
- ICON-QA Validation (iconqa_val)
- ICON-QA Test (iconqa_test)
- JMMMU (jmmmu)
- LLaVA-NeXT-Interleave-Bench (llava_interleave_bench)
- llava_interleave_bench_in_domain
- llava_interleave_bench_out_domain
- llava_interleave_bench_multi_view
- MIRB (mirb)
- MMMU (mmmu)
- MMMU Validation (mmmu_val)
- MMMU Test (mmmu_test)
- MMMU_Pro
- MMMU Pro (mmmu_pro)
- MMMU Pro Original (mmmu_pro_original)
- MMMU Pro Vision (mmmu_pro_vision)
- MMMU Pro COT (mmmu_pro_cot)
- MMMU Pro Original COT (mmmu_pro_original_cot)
- MMMU Pro Vision COT (mmmu_pro_vision_cot)
- MMMU Pro Composite COT (mmmu_pro_composite_cot)
- MMMU Pro (mmmu_pro)
- MMT Multiple Image (mmt_mi)
- MMT Multiple Image Validation (mmt_mi_val)
- MMT Multiple Image Test (mmt_mi_test)
- MuirBench (muirbench)
- MP-DocVQA (multidocvqa)
- MP-DocVQA Validation (multidocvqa_val)
- MP-DocVQA Test (multidocvqa_test)
- OlympiadBench (olympiadbench)
- OlympiadBench Test English (olympiadbench_test_en)
- OlympiadBench Test Chinese (olympiadbench_test_cn)
- Q-Bench (qbenchs_dev)
- Q-Bench2-HF (qbench2_dev)
- Q-Bench-HF (qbench_dev)
- A-Bench-HF (abench_dev)
-
ActivityNet-QA (activitynetqa_generation)
-
SeedBench (seedbench)
-
SeedBench 2 (seedbench_2)
-
CVRR-ES (cvrr)
- cvrr_continuity_and_object_instance_count
- cvrr_fine_grained_action_understanding
- cvrr_interpretation_of_social_context
- cvrr_interpretation_of_visual_context
- cvrr_multiple_actions_in_a_single_video
- cvrr_non_existent_actions_with_existent_scene_depictions
- cvrr_non_existent_actions_with_non_existent_scene_depictions
- cvrr_partial_actions
- cvrr_time_order_understanding
- cvrr_understanding_emotional_context
- cvrr_unusual_and_physically_anomalous_activities
-
EgoSchema (egoschema)
- egoschema_mcppl
- egoschema_subset_mcppl
- egoschema_subset
-
MovieChat (moviechat)
- Global Mode for entire video (moviechat_global)
- Breakpoint Mode for specific moments (moviechat_breakpoint)
-
MLVU (mlvu)
-
MMT-Bench (mmt)
- MMT Validation (mmt_val)
- MMT Test (mmt_test)
-
MVBench (mvbench)
- mvbench_action_sequence
- mvbench_moving_count
- mvbench_action_prediction
- mvbench_episodic_reasoning
- mvbench_action_antonym
- mvbench_action_count
- mvbench_scene_transition
- mvbench_object_shuffle
- mvbench_object_existence
- mvbench_fine_grained_pose
- mvbench_unexpected_action
- mvbench_moving_direction
- mvbench_state_change
- mvbench_object_interaction
- mvbench_character_order
- mvbench_action_localization
- mvbench_counterfactual_inference
- mvbench_fine_grained_action
- mvbench_moving_attribute
- mvbench_egocentric_navigation
-
NExT-QA (nextqa)
- NExT-QA Multiple Choice Test (nextqa_mc_test)
- NExT-QA Open Ended Validation (nextqa_oe_val)
- NExT-QA Open Ended Test (nextqa_oe_test)
-
- PerceptionTest Test
- perceptiontest_test_mc
- perceptiontest_test_mcppl
- PerceptionTest Validation
- perceptiontest_val_mc
- perceptiontest_val_mcppl
- PerceptionTest Test
-
TempCompass (tempcompass)
- tempcompass_multi_choice
- tempcompass_yes_no
- tempcompass_caption_matching
- tempcompass_captioning
-
TemporalBench (temporalbench)
- temporalbench_short_qa
- temporalbench_long_qa
- temporalbench_short_caption
-
Vatex (vatex)
- Vatex Chinese (vatex_val_zh)
- Vatex Test (vatex_test)
-
VideoDetailDescription (video_dc499)
-
Video-ChatGPT (videochatgpt)
- Video-ChatGPT Generic (videochatgpt_gen)
- Video-ChatGPT Temporal (videochatgpt_temporal)
- Video-ChatGPT Consistency (videochatgpt_consistency)
-
Video-MME (videomme)
-
Vinoground (vinoground)
-
VITATECS (vitatecs)
- VITATECS Direction (vitatecs_direction)
- VITATECS Intensity (vitatecs_intensity)
- VITATECS Sequence (vitatecs_sequence)
- VITATECS Compositionality (vitatecs_compositionality)
- VITATECS Localization (vitatecs_localization)
- VITATECS Type (vitatecs_type)
-
WorldQA (worldqa)
- WorldQA Generation (worldqa_gen)
- WorldQA Multiple Choice (worldqa_mc)
-
YouCook2 (youcook2_val)
-
VDC (vdc)
- VDC Detailed Caption (detailed_test)
- VDC Camera Caption (camera_test)
- VDC Short Caption (short_test)
- VDC Background Caption (background_test)
- VDC Main Object Caption (main_object_test)