-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Tar APIs #67883
Implement Tar APIs #67883
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsFixes #65951 It's ready to review but I still have some small TODOs to address.
cc @baronfel
|
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
|
||
using System.IO; | ||
|
||
namespace System.Formats.Tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we share the existing WrappedStream.cs by moving it into Common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have one already being shared, but I needed to slightly modify it for my own purposes and didn't want to break the other tests depending on it. So that's why I had to copy it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very similar. I wonder if there's a way to parameterize a single copy.
I didn't review properly - someone else will need to 😃 |
src/libraries/Common/src/Interop/Unix/System.Native/Interop.DeviceFiles.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuTarEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuTarEntry.cs
Outdated
Show resolved
Hide resolved
get => _header._aTime; | ||
set | ||
{ | ||
// TODO: Is there a max value? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general what's the plan for all these TODOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address them before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start getting close to the Preview 4 snap such that the merge is at-risk, I'd also be open to filing a follow-up issue for any of the TODOs that aren't blocking, for them to be addressed in Preview 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can collect a list of the pending TODOs into a comment so we can document them in this preview as features that won't yet be fully working.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/PaxTarEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Outdated
Show resolved
Hide resolved
StringSplitOptions splitOptions = StringSplitOptions.RemoveEmptyEntries; | ||
|
||
string[] attributeArray = nextLine.Split(' ', 2, splitOptions); | ||
if (attributeArray.Length != 2) | ||
{ | ||
return false; | ||
} | ||
|
||
string[] keyAndValueArray = attributeArray[1].Split('=', 2, splitOptions); | ||
if (keyAndValueArray.Length != 2) | ||
{ | ||
return false; | ||
} | ||
|
||
key = keyAndValueArray[0]; | ||
value = keyAndValueArray[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be much more efficient to parse this with IndexOf rather than Split
private static readonly byte[] s_paxMagic = new byte[] { 0x75, 0x73, 0x74, 0x61, 0x72, 0x0 }; // "ustar\0" | ||
private static readonly byte[] s_paxVersion = new byte[] { 0x30, 0x30 }; // "00" | ||
|
||
private static readonly byte[] s_gnuMagic = new byte[] { 0x75, 0x73, 0x74, 0x61, 0x72, 0x20 }; // "ustar " | ||
private static readonly byte[] s_gnuVersion = new byte[] { 0x20, 0x0 }; // " \0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be spans instead? e.g.
private static ReadOnlySpan<byte> PaxMagic => new byte[] { ... };
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the suggestion is a property. Is this smart enough to cache the array, instead of returning new byte[]
every time it is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this, but I'm still curious. I suspect the answer is that it gets cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roslyn has an optimization to refer to the static data segment of the assembly. So no allocation happens here.
See sharplab.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Outdated
Show resolved
Hide resolved
checksum += WriteRightAlignedBytesAndGetChecksum(uidBytes, _uidBytes); | ||
|
||
byte[] gidBytes = TarHelpers.GetAsciiBytes(TarHelpers.ConvertDecimalToOctal(_gid)); | ||
checksum += WriteRightAlignedBytesAndGetChecksum(gidBytes, _gidBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll stop commenting on these, too, but there are more efficient ways to do this rather than allocating new byte[]s each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with ArrayPool.
I seem to have generated the tar and tar.gz files from runtime-testdata with the wrong uid and gid, so some tests will fail due to that. I'll have to generate them again with the script and update the package here. Edit: Fixed. |
src/libraries/Common/src/Interop/Unix/System.Native/Interop.DeviceFiles.cs
Show resolved
Hide resolved
// If the permissions weren't set at all, don't write the file's permissions. | ||
if (permissions != 0) | ||
{ | ||
Interop.CheckIo(Interop.Sys.FChMod(handle, permissions), destinationFileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike tar
this doesn't respect umask
while extracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my elaborate comment here: #67883 (comment).
I think we want to respect the umask since that is something that protects the user for unintentionally opening up permissions.
To do that, you can call the internal
SafeHandle.Open
on Unix that accepts Interop.Sys.Permissions
:
runtime/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs
Lines 185 to 190 in dbf5c58
internal static SafeFileHandle Open(string fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, long preallocationSize, | |
Interop.Sys.Permissions openPermissions = DefaultOpenPermissions, | |
Func<Interop.ErrorInfo, Interop.Sys.OpenFlags, string, Exception?>? createOpenException = null) | |
{ | |
return Open(fullPath, mode, access, share, options, preallocationSize, openPermissions, out _, out _, createOpenException); | |
} |
Per that comment, at some point we'll also want to support applying all permissions (including sticky, setgroup, setuser), and setting ownership (uid, gid). The API will need to be extended to support that use-case.
cc @eerhardt
const int ExtractPermissionMask = 0x1FF; | ||
int permissions = (int)Mode & ExtractPermissionMask; | ||
|
||
// If the permissions weren't set at all, don't write the file's permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the permissions weren't set at all
Why are we ignoring permissions when none are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your other comment. This is probably wrong so I'll fix it.
tar files have permissions and ownership information. By default:
The implementation isn't considering ownership at the moment, and the permission bits get filtered for priviledged bits but the umask gets ignored. I think we should default to the regular user To be able to opt into the other mode, the API will need to be extended. This doesn't need to happen now if only the default mode should be supported. Though maybe you want to take it into account already, and for example introduce |
src/libraries/Common/src/System/IO/Compression/Archiving.Utils.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/FieldLengths.cs
Outdated
Show resolved
Hide resolved
/// <item>In all platforms: <see cref="TarEntryType.Directory"/>, <see cref="TarEntryType.HardLink"/>, <see cref="TarEntryType.SymbolicLink"/>, <see cref="TarEntryType.RegularFile"/>.</item> | ||
/// <item>In Unix platforms only: <see cref="TarEntryType.BlockDevice"/>, <see cref="TarEntryType.CharacterDevice"/> and <see cref="TarEntryType.Fifo"/>.</item> | ||
/// </list> | ||
/// </remarks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to use inheritdoc
and avoid the need of having dedicated docs for methods that just call base class implementation?
src/libraries/System.Formats.Tar/src/System/Formats/Tar/SubReadStream.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/SeekableSubReadStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/SubReadStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarEntry.cs
Outdated
Show resolved
Hide resolved
|
||
public Task ExtractToFileAsync(string destinationFileName, bool overwrite, CancellationToken cancellationToken = default) | ||
{ | ||
throw new NotImplementedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's on the TODO list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, async will have to be implemented later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not going to implement these as part of this PR, I'd prefer they be removed, at least from the ref, until they're implemented. It's otherwise a bad experience to add new APIs that aren't yet implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Removing them from the ref.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarEntry.cs
Outdated
Show resolved
Hide resolved
// - POSIX IEEE 1003.1-2001 ("POSIX.1") Pax Interchange Tar Format (pax). | ||
// - GNU Tar Format (gnu). | ||
// Documentation: https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5 | ||
internal partial struct TarHeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this struct
seems to be very large and I would not expect it to get us some perf. What are we getting by keeping it a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you prefer if I use class? And maybe split it into internal class TarHeaderRead
and internal class TarHeaderWrite
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect us to get no perf benefits from keeping it struct
, while we can run into issues like modifying a copy instead:
TarHeader header = new();
header._mode = 1;
SetMode(header);
Console.WriteLine(header._mode); // prints 1 as we have passed struct copy to SetMode and modified a copy
static void SetMode(TarHeader h) => h._mode = 2;
src/libraries/Common/src/Interop/Unix/System.Native/Interop.DeviceFiles.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Native/Interop.DeviceFiles.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Native/Interop.DeviceFiles.cs
Outdated
Show resolved
Hide resolved
…ntry keeps getting called.
…he first null entry.
…d condition to symlink tests that checks if system can create symlinks.
if (value < DateTimeOffset.UnixEpoch) | ||
{ | ||
throw new ArgumentOutOfRangeException(nameof(value)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking current time is expensive. IMO the current implementation (having only the UnixEpoch
check) is all we need.
// Does not support writing. | ||
internal sealed class SeekableSubReadStream : SubReadStream | ||
{ | ||
public SeekableSubReadStream(Stream superStream, long startPosition, long maxLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call these something other than "sub" and "super"? I find the terminology confusing.
@stephentoub SuperSubmarineStream
? ;)
|
||
_header = default; | ||
|
||
_header._extendedAttributes = new Dictionary<string, string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Now I lazy-initialize this field.
@carlossanlop have you pushed these changes? I can still see _header._extendedAttributes = new Dictionary<string, string>();
public int Uid | ||
{ | ||
get => _header._uid; | ||
set => _header._uid = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we re-propose this as uint instead of int?
That would be my preference, but get ready for a lot of pushback at the review :)
// If the path can be extracted in the specified destination directory, returns the full path with sanitized file name. Otherwise, throws. | ||
static string GetSanitizedFullPath(string destinationDirectoryFullPath, string path, string exceptionMessage) | ||
{ | ||
string actualPath = Path.Join(Path.GetDirectoryName(path), ArchivingUtils.SanitizeEntryFilePath(Path.GetFileName(path))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf: since Path.GetFileName
returns only a substring of the provided path, we could use the overload that returns ReadOnlySpan<char>
here and make SanitizeEntryFilePath
accept ROS
and avoid additional string allocation
{ | ||
string actualPath = Path.Join(Path.GetDirectoryName(path), ArchivingUtils.SanitizeEntryFilePath(Path.GetFileName(path))); | ||
|
||
if (!Path.IsPathFullyQualified(actualPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my education: for what input this could be true?
{ | ||
throw new IOException(string.Format(SR.IO_AlreadyExists_Name, filePath)); | ||
} | ||
File.Delete(filePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we like to remove the file now, if the method can still throw few checks later?
If we want to overwrite a file we can use FileMode.Create
when creating a FileStream
:
Specifies that the operating system should create a new file. If the file already exists, it will be overwritten.
if (Path.Exists(destinationFileName)) | ||
{ | ||
throw new IOException(string.Format(SR.IO_FileExists_Name, destinationFileName)); | ||
} | ||
|
||
using FileStream fs = File.Create(destinationFileName, bufferSize: 0x1000, FileOptions.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you create FileStream
with FileMode.CreateNew
it's going to throw if the file already exists.
if (Path.Exists(destinationFileName)) | |
{ | |
throw new IOException(string.Format(SR.IO_FileExists_Name, destinationFileName)); | |
} | |
using FileStream fs = File.Create(destinationFileName, bufferSize: 0x1000, FileOptions.None); | |
using FileStream fs = new (destinationFileName, FileMode.CreateNew, FileAccess.Write); |
Debug.Assert(entryNameLength > 0); | ||
|
||
bool isDirectory = file.Attributes.HasFlag(FileAttributes.Directory); | ||
string entryName = ArchivingUtils.EntryFromPath(file.FullName, basePath.Length, entryNameLength, ref entryNameBuffer, appendPathSeparator: isDirectory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to your PR: I find it unintuitive that a buffer rented from array pool is passed by reference to a helper method. I needed to verify that the helper method returns the buffer to the pool in case it's too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything I should do on my code to help improve its readability? Or maybe should we open an issue to discuss it separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just do the rent/return inside EntryFromPath as an implementation detail.
// - POSIX IEEE 1003.1-2001 ("POSIX.1") Pax Interchange Tar Format (pax). | ||
// - GNU Tar Format (gnu). | ||
// Documentation: https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5 | ||
internal partial struct TarHeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect us to get no perf benefits from keeping it struct
, while we can run into issues like modifying a copy instead:
TarHeader header = new();
header._mode = 1;
SetMode(header);
Console.WriteLine(header._mode); // prints 1 as we have passed struct copy to SetMode and modified a copy
static void SetMode(TarHeader h) => h._mode = 2;
The 3 test failures are unrelated to my changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, offline, let's merge it now, include in Preview4 and address remaining feedback later (#68230).
Big thanks for you work @carlossanlop !
Thanks Adam! I'll add your latest feedback to the follow-up issue checklist. |
/backport to release/7.0-preview4 |
Started backporting to release/7.0-preview4: /~https://github.com/dotnet/runtime/actions/runs/2203040454 |
@carlossanlop backporting to release/7.0-preview4 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Implement Tar APIs
Applying: Address some comment suggestions.
Applying: Add SystemFormatsTarTestData package dependency entries and versions.
Applying: Merge the major and minor p/invokes into a single one.
Applying: Address exception and nullability suggestions.
Applying: Adjust tests for the latest suggestions.
Applying: Document elevation requirement to extract device files on Unix.
Applying: Add tests to verify extraction without elevation throws for device files. Rename file for TarFile Extraction to filesystem.
Applying: Add separate expected mode for special files from assets (644).
Applying: Bump assets version to one with the fix
error: sha1 information is lacking or useless (eng/Versions.props).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0010 Bump assets version to one with the fix
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
API proposal: dotnet#65951
* Implement Tar APIs (#67883) API proposal: #65951 * Add assembly to NetCoreAppLibrary.props * Remove <PackageDescription> from src csproj since it is not OOB. * Add NetCoreAppCurrent to src csproj TargetFrameworks * Additional src csproj changes. * Allow sharing of input tar file for read Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com> Co-authored-by: Dan Moseley <danmose@microsoft.com>
Fixes #65951
Approved proposal in this comment: #65951 (comment)
It's ready to review but I still have some small TODOs to address.
cc @baronfel