LLM fixes; new benchmarks
New benchmarks
Overall
Benchmark against llamaparse, docling, mathpix (see README for how to run benchmarks). Marker performs favorably against alternatives in speed, llm as judge scoring, and heuristic scoring.
Table
Benchmark tables against gemini flash:
Update gemini model
- Use the new genai library
- Update to gemini flash 2.0
Misc bugfixes
- Fix bug with OCR heuristics not being aggressive enough
- Fix bug with empty tables
- Ensure references get passed through in llm processors
What's Changed
- Add llm text support for references, superscripts etc by @iammosespaulr in #523
- Update overall benchmark by @VikParuchuri in #515
- Benchmarks by @VikParuchuri in #531
Full Changelog: v1.3.5...v1.4.0