PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
Updated
Jan 17, 2025 - Python
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Open source Python library for converting PDF to DOCX.
A CLI toolset to generate table of contents for PDF files automatically.
Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
A Pure Python PDFViewer, which provides functionalities same as other famous PDFViewers.
(PDF translation)Multilingual PDF processing tool, supports online and offline translation while maintaining original layout; performs OCR on scanned PDFs, faster than ocrmypdf. Provides a Web UI for comparing original PDFs, includes chat with PDF functionality, and academic PDF search based on the Semantic Scholar API.
In this code, a simple implementation of PDF to audio converter is shown
pdfgui_tools is a user interface tool developed in Qt and Python that integrates with poppler-utils and PyPDF2 for PDF document management. It's a simple and user-friendly tool that includes various utilities.
Multimodal LLM Application with PyMuPDF4LLM
Collection of PDF parsing libraries like AI based docling, claude, openai, llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
Fills the lack of an open-source PDF Editor with the capability to draw and add notes
Useful PDF-related productivity tool.
Automated extraction of specific information from invoices, achieving over 95% accuracy.
Creates PDF annotations from Kindle clippings
UVA Data Science Capstone project for Internet Archive. This project aimed to classify PDFs as research or non-research documents using an image and text-based approach. For the image-based models, we leveraged CNN transfer learning and used XGBoost for text-based approach.
Add a description, image, and links to the pymupdf topic page so that developers can more easily learn about it.
To associate your repository with the pymupdf topic, visit your repo's landing page and select "manage topics."