LLM mode, better OCR heuristics, faster
Overview
Significant improvements to quality and speed. There is now LLM mode, which will optionally leverage LLMs to boost output quality. OCR heuristics are significantly improved, and marker will now make good decisions about when to re-OCR the document. Layout model is faster and more accurate.
Quality
- Optionally pass the
--use_llm
flag to improve tables, inline math, forms, complex pages, and general quality. - Automatically detect bad OCR text and re-OCR the document. This consists of some PDF-level heuristics and a new OCR quality model.
- Pass the
--strip_existing_ocr
flag to always ignore existing OCR and redo it instead. - Layout blocks are now detected more accurately when passing
--use_llm
.
Speed
- Layout model is now half the size and ~2x faster (most of the runtime in the general case is layout, so this should result in a big overall speedup). It's also more accurate.
Misc
- Pass the
--disable_image_extraction
flag to avoid extracting images. - Pass
--use_llm
and--disable_image_extraction
to automatically convert images to descriptions. - Made it easy to extract individual block types from the document (for example, getting all tables out)
Partial Changelog
- Add New OCR Heuristics Model by @tarun-menta in #427
- Vik dev by @VikParuchuri in #434
- High Quality Layout Builder and Text Processors by @iammosespaulr in #429
- Vik dev by @VikParuchuri in #438
- Vik dev by @VikParuchuri in #447
- Additional heuristics for bad PDF text extraction by @iammosespaulr in #446
- LLM based image captioning by @VikParuchuri in #454
New Contributors
- @tarun-menta made their first contribution in #427
Full Changelog: v1.1.0...v1.2.0