-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove dependency on ZipFile and remove GC call from read #273
Remove dependency on ZipFile and remove GC call from read #273
Conversation
To make this work, I had to set the `enable_cache` kwarg to `true` rather than false for `readdata()` and `readtable()`
These tests will fail with `enable_cache=false` for both `readdata()` and `readtable()` (as in the current master). This PR changes this kwarg for these functions to `true`.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #273 +/- ##
==========================================
+ Coverage 95.02% 95.06% +0.04%
==========================================
Files 15 15
Lines 2009 1985 -24
==========================================
- Hits 1909 1887 -22
+ Misses 100 98 -2 ☔ View full report in Codecov by Sentry. |
I'm not sure I understand the failed test on codecov. If I look at the details, 100% of my new code is covered by the tests. All the code that is not covered was there before this PR. Despite this, overall coverage has gone down and the test has failed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on how performance can be improved.
src/read.jl
Outdated
xf.files[filename] = true # set file as read | ||
|
||
try | ||
xf.data[filename] = EzXML.readxml(f) | ||
xf.data[filename] = EzXML.parsexml(ZipArchives.zip_readentry(xf.io, f, String)) | ||
catch err | ||
@error("Failed to parse internal XML file `$filename`") | ||
rethrow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole for loop and file_not_found
check can be simplified to:
try
xf.data[filename] = EzXML.parsexml(ZipArchives.zip_readentry(xf.io, filename))
catch err
@error("Failed to parse internal XML file `$filename`")
rethrow()
end
Because zip_readentry
will take care of looping through the entry names, reading a matching entry, or erroring if the filename isn't found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for all these suggestions. Where possible, I've accepted them here, as you can see.
Your last suggestion also seems to need the line
xf.files[filename] = true # set file as read
in the try
block, otherwise the file is recorded as not present.
I've added this to my fork and will figure out how to include it here.
This should be faster because it avoids creating a String and there is a check of the uncompressed_size in /~https://github.com/JuliaIO/ZipArchives.jl/blob/f955785e237a0a8b3607cf651eaebc1eb1037b8c/src/reader.jl#L344 Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>
zip_openentry can be used here to avoid decompressing the entire entry into memory. Also, the error on the line after this can be removed with this change. Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>
This should be faster because it avoids allocating all of the entry names at once. Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>
One reflection here is that my proposed changes have removed the I am working on adding it back in but, unfortunately, It has caused the original problem to arise, namely, cannot write after read:
I've spent some time on this now but - for me at least - it seems pretty intractable! :-( |
The segfaults are very strange, it seems like there is a bug in EzXML.jl. I wonder if it would be possible to switch to /~https://github.com/JuliaComputing/XML.jl. This would help fix the multi-threading issues as well. |
Gulp! |
Just found this on the hdf5.jl docs
Might this be something that is going on here? |
Replacing ZipFile with ZipArchive has also removed the need for the GC call in read.