Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement compression/decompression of filenames ending .gz #3963

Merged
merged 1 commit into from
Apr 21, 2020

Conversation

ChrisJefferson
Copy link
Contributor

This implements automatic compression/decompression of files ending .gz.

The documentation claims we do this... the functionality was mostly added but the final code was not added to SyFopen.

There are a number of tests which make sure everything works correctly.

One note: If a file ends in .gz, but is not a compressed file, it is still opened correctly. This means anyone who has tried using compression will not find they have unreadable files (this was always intended behaviour, to not break any strange old code).

@fingolfin fingolfin added kind: bug Issues describing general bugs, and PRs fixing them kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: kernel labels Apr 15, 2020
@fingolfin
Copy link
Member

I am guessing this should be backported, right?

src/sysfiles.c Outdated Show resolved Hide resolved
src/sysfiles.c Outdated Show resolved Hide resolved
src/sysfiles.c Outdated Show resolved Hide resolved
tst/testinstall/compressed.tst Outdated Show resolved Hide resolved
tst/testinstall/compressed.tst Outdated Show resolved Hide resolved
tst/testinstall/compressed.tst Show resolved Hide resolved
> local out, str,prog;
> str := "";
> out := OutputTextString(str, true);
> Process(dir, Filename(DirectoriesSystemPrograms(),"cat"), InputTextNone(), out, [name]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, that's yucky. Can you please at least add a comment that you are not simply using StringFile here because that would decompress the data?

I guess what we'd need to make this less ugly is an InputRawFile or so which does not perform any transparent decompression. Or perhaps an optional flag to InputTextFile which allows turning off the decompression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't like this. I'm happy with either of those suggestions, if anyone has a strong prference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a little more, the problem with introducing some kind of InputRawFile, or anything else, is it is another thing to document which (I think) no-one is ever going to want to use.

We could use the IO package to grab raw access, or introduce some undocumented kernel function to get raw access?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me; see comments above for some ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: It's also OK by me to merge this now as-is; we can always revisit this later. Perhaps amend 1-2 of the minor optional remarks I made, if you feel like it; otherwise just merge it as-is shrug

tst/testinstall/compressed.tst Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Apr 15, 2020

Coverage Status

Coverage increased (+0.007%) to 84.337% when pulling ca7040c on ChrisJefferson:gzip into 6d1d650 on gap-system:master.

src/sysfiles.c Outdated Show resolved Hide resolved
src/sysfiles.c Outdated Show resolved Hide resolved
syBuf[fid].type = gzip_socket;
syBuf[fid].fp = -1;
syBuf[fid].bufno = -1;
} else if (0 <= (syBuf[fid].fp = open(name, flags, 0644))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so we are patching the most low-level function in the kernel for opening a file to transparently access .gz files. That makes it really tough to bypass this, one has to either bypass SyFopen or else copy or rename the file in question to look at its "true" content.

I think it is OK to do this now in this PR, but I'd like us to revisit this. I notice that SyFopen still takes an old clunky "mode" string as second argument. That argument could be augmented to indicate that "raw" or "binary" mode (no automatic (de)compressing) is wanted. Indeed, we already supported on Cygwin using "rb" and "wb" modes (for savefiles).

But I'd also suggest to change it from a const char * to an integer bit mask (basically similar to the POSIX O_RDONLY etc. flags for open, just out own portable version of those).

Anyway, either way it'd be trivial to add a simple kernel helper function which basically does the same as StringFile but using this "raw" mode (StringFile is a GAP function but one that's trivial to rewrite in a <10 lines kernel function, and then only the argument to SyFopen in there needs to be changed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the idea of changing the mode string to an int (or enum) is separate from the rest; this is just ramblings, don't feel obliged to implement any of this.

Copy link
Contributor Author

@ChrisJefferson ChrisJefferson Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to your suggestion, but I'm a bit unsure about implementing it at all. I don't really like adding stuff just to support testing. If anyone is upset they can't access raw gzip files I'm very happy to fix it -- although at the moment we don't really support binary file access at all (on Windows we always do line ending conversion for example, so we'd probably corrupt raw gzipped files anyway).

> local out, str,prog;
> str := "";
> out := OutputTextString(str, true);
> Process(dir, Filename(DirectoriesSystemPrograms(),"cat"), InputTextNone(), out, [name]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me; see comments above for some ideas.

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisJefferson there is one whitespace formatting thingy; but feel free to ignore shrug

@fingolfin
Copy link
Member

Backported to stable-4.11 in commit c69cda3

@ChrisJefferson ChrisJefferson deleted the gzip branch February 15, 2021 14:39
@ThomasBreuer ThomasBreuer self-assigned this Feb 17, 2021
@ThomasBreuer ThomasBreuer added the release notes: added PRs introducing changes that have since been mentioned in the release notes label Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-4.11-DONE kind: bug Issues describing general bugs, and PRs fixing them kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements release notes: added PRs introducing changes that have since been mentioned in the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants