Releases: VikParuchuri/marker
Releases · VikParuchuri/marker
Speedups, bug fixes
- Fix some edge case OCR bugs
- ~20% end to end speedup by improving layout and text detection
Fix OCR bugs
- Fix bbox issue with OCR and resizing
- Fix issue with layout bboxes missing after OCR
Fix misc bugs
- Ensure we don't have 0 area table boxes
- Ensure fullymergedblock gets a valid input
Fix layout bugs
- Improve layout, which improves output quality
- Fix header level detection bugs
Fix OOM errors
- Add batch size for table rec model to avoid OOM
- Enable configuring batch size
- Fix error with debugging
Bugfixes, output quality improvement
- Fix MPS bug with torch 2.5
- Fix heading bug with zero line blocks
- Improve output quality when visual boxes and text boxes are offset
Better tables, improved output quality, header levels
Tables!
- Integrate custom table model for better table rendering - this uses a new state of the art open table model
Markdown output
- Adjust block detection to improve markdown output globally
- Assign layout labels to blocks in a better way - will improve quality globally
- Better line spacing in markdown output
- Push footnotes to end of page
Header levels
- Add detection for header levels like #, ##, etc.
- Add computed table of contents
Bugfixes/misc
- Fix bug with pagination not working
- Much better debugging with debug image output
- Python 3.13 support
OCR and misc improvements; demo app
- Language no longer needs to be specified
- Fix OCR memory leak
- Add marker GUI demo app to test out conversion
- Add progress for equation detection
- Improve table recognition slightly
- Add table benchmark
Significant speedup
This release has a 15% GPU speedup, 3x CPU, 7x MPS. The speedup comes from new surya models for layout and text detection that are a lot more efficient.
This is a "best case" speedup, if you need to OCR or do equation recognition, the speedup will be lower. But it will still be a lot faster.
Fix transformers bugs
- New transformers version introduces a new kwarg in donut models. Handle this case by ignoring it.
- New transformers version breaks MPS compatibility by using torch .isin to do a comparison. Handle this by setting the pytorch mps fallback setting.