Skip to content

Commit

Permalink
PdfPage.get_objects(): don't register objects as kids
Browse files Browse the repository at this point in the history
This was especially problematic as weakrefs are not cleaned up when the
object in question is closed/collected, so we potentially store many
dead pointers.
Imagine a caller invoking get_objects() repeatedly for iterating and a
page handle living for a long time afterwards - that somewhat resembles
a memory leak.
  • Loading branch information
mara004 committed May 18, 2024
1 parent 6f13da6 commit 38f5efe
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/pypdfium2/_helpers/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,8 @@ def get_objects(self, filter=None, max_depth=2, form=None, level=0):
if raw_obj is None:
raise PdfiumError("Failed to get page object.")

# Not a child object, because the lifetime of pageobjects that are part of a page is managed by pdfium. The .page reference is enough to keep the parent alive, unless the caller explicitly closes it (which may not merit storing countless of weakrefs).
helper_obj = PdfObject(raw_obj, page=self, pdf=self.pdf, level=level)
self._add_kid(helper_obj)
if not filter or helper_obj.type in filter:
yield helper_obj

Expand Down

0 comments on commit 38f5efe

Please sign in to comment.