-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement compression/decompression of filenames ending .gz #3963
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
not compressed |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
#@local dir,fname,isGzippedFile,stream,str | ||
gap> START_TEST("compressed.tst"); | ||
gap> dir := DirectoryTemporary();; | ||
gap> fname := Filename(dir, "test.g.gz");; | ||
|
||
# Let us check when we have written a compressed file by checking the gzip header | ||
gap> isGzippedFile := function(dir, name) | ||
> local out, str,prog; | ||
> str := ""; | ||
> out := OutputTextString(str, true); | ||
> Process(dir, Filename(DirectoriesSystemPrograms(),"cat"), InputTextNone(), out, [name]); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hrm, that's yucky. Can you please at least add a comment that you are not simply using I guess what we'd need to make this less ugly is an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I didn't like this. I'm happy with either of those suggestions, if anyone has a strong prference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking a little more, the problem with introducing some kind of InputRawFile, or anything else, is it is another thing to document which (I think) no-one is ever going to want to use. We could use the IO package to grab raw access, or introduce some undocumented kernel function to get raw access? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fine by me; see comments above for some ideas. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to be clear: It's also OK by me to merge this now as-is; we can always revisit this later. Perhaps amend 1-2 of the minor optional remarks I made, if you feel like it; otherwise just merge it as-is shrug |
||
> return str{[1..2]} = "\037\213"; | ||
> end;; | ||
gap> str := "hello\ngoodbye\n";; | ||
|
||
# Write a compressed file | ||
gap> FileString( fname, str ) = Length(str); | ||
true | ||
|
||
# Check file really is compressed | ||
gap> isGzippedFile(dir, "test.g.gz"); | ||
true | ||
|
||
# Check reading compressed file | ||
gap> StringFile( fname ) = str; | ||
true | ||
|
||
# Check gz is added transparently | ||
gap> StringFile( Filename(dir, "test.g") ) = str; | ||
fingolfin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
true | ||
|
||
# Test reading/seeking in a gzip compressed file | ||
gap> stream := InputTextFile(fname);; | ||
gap> ReadLine(stream); | ||
"hello\n" | ||
gap> ReadLine(stream); | ||
"goodbye\n" | ||
gap> ReadLine(stream); | ||
fail | ||
gap> SeekPositionStream(stream, -1); | ||
fail | ||
gap> SeekPositionStream(stream, 0); | ||
true | ||
gap> ReadLine(stream); | ||
"hello\n" | ||
gap> ReadLine(stream); | ||
"goodbye\n" | ||
gap> ReadLine(stream); | ||
fail | ||
gap> SeekPositionStream(stream, 2); | ||
true | ||
gap> PositionStream(stream); | ||
2 | ||
gap> ReadLine(stream); | ||
"llo\n" | ||
gap> ReadLine(stream); | ||
"goodbye\n" | ||
gap> SeekPositionStream(stream, 0); | ||
true | ||
gap> ReadAll(stream) = str; | ||
true | ||
gap> SeekPositionStream(stream, 0); | ||
true | ||
gap> PositionStream(stream); | ||
0 | ||
gap> ReadAll(stream) = str; | ||
true | ||
gap> CloseStream(stream); | ||
|
||
# Test multiple writes | ||
gap> stream := OutputTextFile( fname, false );; | ||
gap> PrintTo( stream, "1"); | ||
gap> AppendTo( stream, "2"); | ||
gap> PrintTo( stream, "3"); | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
gap> isGzippedFile(dir, "test.g.gz"); | ||
true | ||
|
||
# verify it | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream); | ||
"123" | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
|
||
# partial reads | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream, 2); | ||
"12" | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
|
||
# too long partial read | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream, 5); | ||
"123" | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
|
||
# error partial read | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream, -1); | ||
Error, ReadAll: negative limit is not allowed | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
|
||
# append to initial data | ||
gap> stream := OutputTextFile( fname, true );; | ||
gap> PrintTo( stream, "4"); | ||
gap> CloseStream(stream); | ||
|
||
# verify it | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream); | ||
"1234" | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
|
||
# overwrite initial data | ||
gap> stream := OutputTextFile( fname, false );; | ||
gap> PrintTo( stream, "new content"); | ||
gap> CloseStream(stream); | ||
|
||
# verify it | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream); | ||
"new content" | ||
gap> CloseStream(stream); | ||
gap> stream; | ||
closed-stream | ||
|
||
# ReadAll with length limit | ||
gap> stream := InputTextFile( fname );; | ||
gap> ReadAll(stream, 3); | ||
"new" | ||
gap> CloseStream(stream); | ||
|
||
# test PrintFormattingStatus | ||
gap> stream := OutputTextFile( fname, false );; | ||
gap> PrintFormattingStatus(stream); | ||
true | ||
gap> PrintTo( stream, "a very long line that GAP is going to wrap at 80 chars by default if we don't do anything about it\n"); | ||
gap> CloseStream(stream); | ||
gap> StringFile(fname); | ||
"a very long line that GAP is going to wrap at 80 chars by default if we don't\ | ||
\\\ndo anything about it\n" | ||
gap> stream := OutputTextFile( fname, false );; | ||
gap> SetPrintFormattingStatus(stream, false); | ||
gap> PrintFormattingStatus(stream); | ||
false | ||
gap> PrintTo( stream, "a very long line that GAP is going to wrap at 80 chars by default if we don't do anything about it\n"); | ||
gap> CloseStream(stream); | ||
gap> StringFile(fname); | ||
"a very long line that GAP is going to wrap at 80 chars by default if we don't\ | ||
do anything about it\n" | ||
|
||
# Test even if a file ends in .gz, if it is not compressed it can still be read | ||
gap> stream := InputTextFile(Filename(DirectoriesLibrary("tst"), "example-dir/compress/not-compressed.txt.gz"));; | ||
gap> ReadAll(stream) = "not compressed\n"; | ||
true | ||
gap> CloseStream(stream); | ||
gap> STOP_TEST("compressed.tst"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so we are patching the most low-level function in the kernel for opening a file to transparently access .gz files. That makes it really tough to bypass this, one has to either bypass
SyFopen
or else copy or rename the file in question to look at its "true" content.I think it is OK to do this now in this PR, but I'd like us to revisit this. I notice that
SyFopen
still takes an old clunky "mode" string as second argument. That argument could be augmented to indicate that "raw" or "binary" mode (no automatic (de)compressing) is wanted. Indeed, we already supported on Cygwin using"rb"
and"wb"
modes (for savefiles).But I'd also suggest to change it from a
const char *
to an integer bit mask (basically similar to the POSIXO_RDONLY
etc. flags foropen
, just out own portable version of those).Anyway, either way it'd be trivial to add a simple kernel helper function which basically does the same as
StringFile
but using this "raw" mode (StringFile
is a GAP function but one that's trivial to rewrite in a <10 lines kernel function, and then only the argument toSyFopen
in there needs to be changed)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course the idea of changing the mode string to an int (or enum) is separate from the rest; this is just ramblings, don't feel obliged to implement any of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to your suggestion, but I'm a bit unsure about implementing it at all. I don't really like adding stuff just to support testing. If anyone is upset they can't access raw gzip files I'm very happy to fix it -- although at the moment we don't really support binary file access at all (on Windows we always do line ending conversion for example, so we'd probably corrupt raw gzipped files anyway).