FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content. Please refer to our paper for more details.

VLMs

Loading Data

Steps to download data and store the images in benchmark_images, and csv in benchmark_csv. Steps to dowload data for the oikantik format

Setup

To run evaluation of VLMs against the FEMRAT dataset, you need to install the required packages by running the following command:

pip install -r requirements.txt

We self-hosted Pixtral-12B-2409 (https://huggingface.co/mistralai/Pixtral-12B-2409), Pixtral-Large-Instruct-2411, LLaMa-3.2-11B-Vision-Instruct, LLaMa-3.2-90B-Vision-Instruct, Phi-3.5-Vision-Instruct using VLLM (/~https://github.com/vllm-project/vllm)

We used hosted services for GPT-Family, Gemini-Family

For self-hosted models,

Set up environment variables:
```
export OPENAI_API_BASE=[ADD_THE_ENDPOINT_URL_OF_HOSTED_MODEL]
```
Example: "http://localhost:8004/v1"
Start Evaluations:
```
python main.py --model [MODEL_NAME] --dir_name [DATA_DIR]
```
- MODEL_NAME: Name of the model to be evaluated. Choices: ['pixtral', 'pixtral_large', 'phi', 'llama_large', 'llama']
- DATA_DIR: Path to the directory where the Benchmark Images are stored
Fill-in CSV

Once the evaluation is done, the results will be stored in a JSON File with the format state_<MODEL_NAME>.json. You can convert this JSON file to a CSV file using the following command:
```
python fill_in_csv.py --model [MODEL_NAME] --csv-file [CSV_FILE] --json-file [JSON_FILE]
```
- MODEL_NAME: Name of the model to be evaluated. Choices: ['pixtral', 'pixtral_large', 'phi', 'llama_large', 'llama']
- CSV_FILE: Path to the CSV file where the results need to be filled in.
- JSON_FILE: Path to the JSON file where the results are stored.

Evaluation

Error Detection

Error Localization

Error Correction

Citation

If you used this repository or our models, please cite our work:

@misc{nath2025visionlanguagemodelsevaluatehandwritten,
      title={Can Vision-Language Models Evaluate Handwritten Math?},
      author={Oikantik Nath and Hanani Bathina and Mohammed Safi Ur Rahman Khan and Mitesh M. Khapra},
      year={2025},
      eprint={2501.07244},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.07244},
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
base		base
steps		steps
utils		utils
.gitignore		.gitignore
FERMAT.png		FERMAT.png
LICENSE		LICENSE
README.md		README.md
annotation_tool.db		annotation_tool.db
client.py		client.py
config.yaml		config.yaml
config_pqa.yaml		config_pqa.yaml
experiment_info.md		experiment_info.md
experiments.txt		experiments.txt
fill_in_csv.py		fill_in_csv.py
main.py		main.py
not_parsed_llama.txt		not_parsed_llama.txt
step_decorator.py		step_decorator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

VLMs

Loading Data

Setup

Evaluation

Error Detection

Error Localization

Error Correction

Citation

About

Releases

Packages

Languages

License

AI4Bharat/FERMAT

Folders and files

Latest commit

History

Repository files navigation

FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

VLMs

Loading Data

Setup

Evaluation

Error Detection

Error Localization

Error Correction

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages