Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower minimum poppler-cpp version to 0.26.0 #13

Merged
merged 8 commits into from
Sep 9, 2020

Conversation

sandeepmistry
Copy link
Contributor

Amazon Linux ships with a super old version of poppler-cpp. The following pull request makes changes to lower the minimum poppler-cpp version to 0.26.0.

Copy link
Owner

@cbrunet cbrunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please ensure that tests on those functions are skipped when running with this poppler version.

@sandeepmistry
Copy link
Contributor Author

Hi @cbrunet,

Thanks for reviewing! I've made the requested changes in d02439a.

Comment on lines +176 to +179
if version() < (0, 46, 0):
assert date.astimezone(timezone.utc) == datetime(
2020, 3, 25, 21, 19, 50, tzinfo=timezone.utc
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange... Do you think this is related to:

cpp:
* switched from detail::convert_date() to core's dateStringToTime()

in version 0.45?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are-you sure the wrong date is related with the poppler version? On my side, the date is the same with older poppler version.... Are you sure you computer has the right timezone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look, I'm using inside an AWS SageMaker notebook, so it might not be set. However, maybe there is somewhere else the timezone is being used incorrectly.

Comment on lines +188 to +189
if version() < (0, 46, 0):
assert info == "Charles"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to you explain this? Do you think it is related to

    cpp:
     * pass len to GooString constructor in detail::ustring_to_unicode_GooString(). Bug #96426

in poppler 0.46?

Copy link
Owner

@cbrunet cbrunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to decorate the methods in document.py as well. You could also update the minimum version in the documentation.

@sandeepmistry
Copy link
Contributor Author

You need to decorate the methods in document.py as well. You could also update the minimum version in the documentation.

Sure, I've made the requested changes in 273daf6 and b68de38.

Copy link
Owner

@cbrunet cbrunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • test_page.py::test_text is failing. I get "Page " instead of "Page 1". The test should probably be updated.
  • test_page.py::test_page_label is failing. I get "" instead of "1". Not sure why. Should we skip the test or mark it as xfail for this poppler version?
  • test_document.py::test_get_property_with_locked_document and test_set_property_with_locked_document are failing because the properties are not available in this version. I think that in document.py, we could fall back to info_key and info_date methods for lower versions of poppler, instead of disabling the author, creator, creation_date, keywords and other properties.

Comment on lines +176 to +179
if version() < (0, 46, 0):
assert date.astimezone(timezone.utc) == datetime(
2020, 3, 25, 21, 19, 50, tzinfo=timezone.utc
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are-you sure the wrong date is related with the poppler version? On my side, the date is the same with older poppler version.... Are you sure you computer has the right timezone?

Comment on lines 72 to 76
@property
@since(0, 46)
@ensure_unlocked
def author(self):
return str(self._document.get_author())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure you understood my comment, here is an example:

Suggested change
@property
@since(0, 46)
@ensure_unlocked
def author(self):
return str(self._document.get_author())
@property
@ensure_unlocked
def author(self):
try:
value = self._document.get_author()
except AttributeError:
value = self._document.info_key('author')
return str(value)

And so on for other properties. This way, you should be able to re-enable some tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to always use .info_key all the time here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. No, I don't think there is any reason, except that the function exists. We could definitively use info_key() and set_info_key() all the time.

@cbrunet
Copy link
Owner

cbrunet commented Sep 3, 2020

I think you could add poppler-0.26.0 to the test matrix in python-poppler/.github/workflows/pythonpackage.yml . This way, the github runner would run the tests against this version as well.

@sandeepmistry
Copy link
Contributor Author

Hi @cbrunet,

I've pushed the changes we've discussed.

The tests were still failing for the dates and string truncation issues with poppler 0.26.0 on the Github workflow server, so I left the changes for conditional checks based on version in the tests. Let me know if anything else needs to be addressed.

@cbrunet cbrunet merged commit 34c5572 into cbrunet:master Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants