-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement compression/decompression of filenames ending .gz #3963
Conversation
I am guessing this should be backported, right? |
> local out, str,prog; | ||
> str := ""; | ||
> out := OutputTextString(str, true); | ||
> Process(dir, Filename(DirectoriesSystemPrograms(),"cat"), InputTextNone(), out, [name]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, that's yucky. Can you please at least add a comment that you are not simply using StringFile
here because that would decompress the data?
I guess what we'd need to make this less ugly is an InputRawFile
or so which does not perform any transparent decompression. Or perhaps an optional flag to InputTextFile
which allows turning off the decompression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I didn't like this. I'm happy with either of those suggestions, if anyone has a strong prference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a little more, the problem with introducing some kind of InputRawFile, or anything else, is it is another thing to document which (I think) no-one is ever going to want to use.
We could use the IO package to grab raw access, or introduce some undocumented kernel function to get raw access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me; see comments above for some ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: It's also OK by me to merge this now as-is; we can always revisit this later. Perhaps amend 1-2 of the minor optional remarks I made, if you feel like it; otherwise just merge it as-is shrug
syBuf[fid].type = gzip_socket; | ||
syBuf[fid].fp = -1; | ||
syBuf[fid].bufno = -1; | ||
} else if (0 <= (syBuf[fid].fp = open(name, flags, 0644))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so we are patching the most low-level function in the kernel for opening a file to transparently access .gz files. That makes it really tough to bypass this, one has to either bypass SyFopen
or else copy or rename the file in question to look at its "true" content.
I think it is OK to do this now in this PR, but I'd like us to revisit this. I notice that SyFopen
still takes an old clunky "mode" string as second argument. That argument could be augmented to indicate that "raw" or "binary" mode (no automatic (de)compressing) is wanted. Indeed, we already supported on Cygwin using "rb"
and "wb"
modes (for savefiles).
But I'd also suggest to change it from a const char *
to an integer bit mask (basically similar to the POSIX O_RDONLY
etc. flags for open
, just out own portable version of those).
Anyway, either way it'd be trivial to add a simple kernel helper function which basically does the same as StringFile
but using this "raw" mode (StringFile
is a GAP function but one that's trivial to rewrite in a <10 lines kernel function, and then only the argument to SyFopen
in there needs to be changed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course the idea of changing the mode string to an int (or enum) is separate from the rest; this is just ramblings, don't feel obliged to implement any of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to your suggestion, but I'm a bit unsure about implementing it at all. I don't really like adding stuff just to support testing. If anyone is upset they can't access raw gzip files I'm very happy to fix it -- although at the moment we don't really support binary file access at all (on Windows we always do line ending conversion for example, so we'd probably corrupt raw gzipped files anyway).
> local out, str,prog; | ||
> str := ""; | ||
> out := OutputTextString(str, true); | ||
> Process(dir, Filename(DirectoriesSystemPrograms(),"cat"), InputTextNone(), out, [name]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me; see comments above for some ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisJefferson there is one whitespace formatting thingy; but feel free to ignore shrug
Backported to stable-4.11 in commit c69cda3 |
This implements automatic compression/decompression of files ending .gz.
The documentation claims we do this... the functionality was mostly added but the final code was not added to SyFopen.
There are a number of tests which make sure everything works correctly.
One note: If a file ends in .gz, but is not a compressed file, it is still opened correctly. This means anyone who has tried using compression will not find they have unreadable files (this was always intended behaviour, to not break any strange old code).