Support xlsx, docx, pptx, html, epub
Marker now has support for additional document formats. You have to run pip install marker-pdf[full]
to install all the dependencies.
Improved text detection
OCR should now work better due to an improved text detection model.
Inline math improvements
- Better inline math detection with an improved model.
- Inline math lines are now inference.
--redo-inline-math
option to enable the highest quality math detection
Misc improvements
- Support for the claude model
- Improve benchmarking scripts
- Merge lines better with new text detection model
What's Changed
- Inline math by @VikParuchuri in #571
- Add Support for DOCX, PPTX, XLSX, HTML and Epub by @iammosespaulr in #501
- Fix character encoding issues when loading JSON configuration files by @vicenciomf2 in #574
- Dev by @VikParuchuri in #573
New Contributors
- @vicenciomf2 made their first contribution in #574
Full Changelog: v1.5.5...v1.6.0