Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade PDF2TXT to version PDFBox 2.0 (PDFTextStripper) #24

Open
petermr opened this issue Mar 30, 2016 · 0 comments
Open

Upgrade PDF2TXT to version PDFBox 2.0 (PDFTextStripper) #24

petermr opened this issue Mar 30, 2016 · 0 comments

Comments

@petermr
Copy link
Member

petermr commented Mar 30, 2016

Currently pdf2txt transformation in norma uses PDF2TXT in PDFBox 1.8. The new version 2.0 is now released and much better I believe.

Complete conversion of pdf2svg is a significant amount of work as signatures have changed, but there is a more or less standalone PDFTextStripper which should be usable. It may require version-aware hacking of the pom file, and may even require removal of old pdf2svg code. OTOH it may be simple to to add the new PDFBox depending on signatures.

jkbcm pushed a commit to jkbcm/norma that referenced this issue Jan 9, 2018
Update Peter from ContentMine Repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant