Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(registry/storage/driver/s3-aws): use a consistent multipart chunk size #4424

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

uhthomas
Copy link
Contributor

Some S3 compatible object storage systems like R2 require that all multipart chunks are the same size. This was mostly true before, except the final chunk was larger than the requested chunk size which causes uploads to fail.

In addition, the two byte slices have been replaced with a single *bytes.Buffer and the surrounding code simplified significantly.

Fixes: #3873

@uhthomas
Copy link
Contributor Author

@milosgajdos Would you be able to take a look?

@uhthomas
Copy link
Contributor Author

So we're on the same page, the main issue was this:

buf := bytes.NewBuffer(w.ready.data)
if w.pending.Len() > 0 && w.pending.Len() < int(w.driver.ChunkSize) {
if _, err := buf.Write(w.pending.data); err != nil {
return err
}
w.pending.Clear()
}

The above would produce a huge final chunk (bigger than the specified chunk size) and would cause the upload to fail.

@ianseyer
Copy link

ianseyer commented Jul 31, 2024

We have spun up an instance of harbor backed by R2 using this PR and have pushed small and large (4.5Gb) images with no issue and no chunksize specified. The success appears to be consistent, and was failing on an identical setup using 2.8.3

@uhthomas uhthomas force-pushed the 3873 branch 2 times, most recently from 2580a22 to cb95fa1 Compare July 31, 2024 01:26
@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 31, 2024

also @corhere @thaJeztah

@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 31, 2024

I have done some reading, and understand this comment has concerns with simply not writing all the data which has been uploaded.

#3940 (comment)

I would need to find a more reliable way to test resumable uploads, but I agree I imagine there there is a discrepancy between what the registry says it has uploaded, versus what it actually has uploaded. Imagine the best thing to do here would be to ensure size matches exactly what data has actually been flushed. The remaining data which does not align with the chunk boundary is safe to discard as the client will just reupload it. Will wait for some thoughts and feedback, but I imagine this shouldn't be a difficult change to make.

@uhthomas uhthomas force-pushed the 3873 branch 2 times, most recently from 13b95d9 to b1e86de Compare July 31, 2024 02:24
@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 31, 2024

I've pushed a change to address the above - essentially flush is now responsible for keeping track of the upload size, and will accurately reflect how much data has been written. Given the registry will return this in the Range header to the client, the client should correctly resume at the correct offset, which should be a chunk boundary :)

@uhthomas

This comment was marked as resolved.

@uhthomas uhthomas force-pushed the 3873 branch 6 times, most recently from 1732bdc to e8b0e55 Compare July 31, 2024 14:00
@milosgajdos milosgajdos requested a review from corhere July 31, 2024 14:05
Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to give this a proper look, I've only skimmed it. I also need to refresh my memory with the original issue, so might take a bit to review.

// maxChunkSize large which fits in to int. The reason why
// we return int64 is to play nice with Go interfaces where
// the buffer implements io.ReaderFrom interface.
n, _ := w.buf.Write(p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make a note here saying that error is always nil when writing to bytes.Buffer.

@uhthomas uhthomas force-pushed the 3873 branch 2 times, most recently from 8ea9ba6 to 6993235 Compare July 31, 2024 14:26
@uhthomas

This comment was marked as resolved.


n, err := w.ready.ReadFrom(resp.Body)
if err != nil {
if _, err := io.Copy(w.buf, resp.Body); err != nil {
Copy link
Contributor Author

@uhthomas uhthomas Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had concerns about this - but the above condition guarantees(?) this data will be less than minChunkSize, which is "tiny" (5MB).

@@ -1389,8 +1332,7 @@ func (d *driver) newWriter(ctx context.Context, key, uploadID string, parts []*s
uploadID: uploadID,
parts: parts,
size: size,
ready: d.NewBuffer(),
pending: d.NewBuffer(),
buf: d.pool.Get().(*bytes.Buffer),
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, unrelated to the changes here (sorry if distracting): Why/how could the parts ever become unordered? Shouldn't it be sufficient to sort the parts once when they are first fetched rather than when the upload is restarted or committed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing some context for this 🤔

@Vad1mo
Copy link

Vad1mo commented Aug 12, 2024

This seems to solve the issue with Cloudflare, @tpoxa and I will test in on our feet and see if we find something.

@BrammyS
Copy link

BrammyS commented Sep 26, 2024

I will also try and test this myself :)
Would really like this to be merged if everything is working fine.

@uhthomas
Copy link
Contributor Author

uhthomas commented Oct 8, 2024

Hope your tests went okay. If it's helpful, there is a prebuilt image here /~https://github.com/users/uhthomas/packages/container/package/distribution%2Fregistry

@Vad1mo
Copy link

Vad1mo commented Oct 11, 2024

@uhthomas @milosgajdos @corhere

Here is our feedback, after testing this feature on R2 for our service offering container-registry.com as my colleague @tpoxa was responsible for the similar PR #3940

We have moved GiBs of date to R2 to test is. Among the tests were our test images vad1mo/1gb-random-file, vad1mo/10gb-random-file that contain 1 GiB sized layer.

  • We didn't experience any issues or problems.
  • We also tested other S3 compatible backend ends, and not only with R2. No issues there.

So from our side the PR is good to go.

Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR on the first look looks ok to me. I left some comments. I need to give it another pass at some point.

docker-bake.hcl Outdated
Comment on lines 60 to 61
"linux/arm/v6",
"linux/arm/v7",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no. Let's not do this. We can't remove these. There are binaries have been released in this arch as well as official images /~https://github.com/distribution/distribution-library-image/blob/a943e89c3efe06134cd9a4b439203c5341082cbc/Dockerfile shipping them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@@ -1389,8 +1332,7 @@ func (d *driver) newWriter(ctx context.Context, key, uploadID string, parts []*s
uploadID: uploadID,
parts: parts,
size: size,
ready: d.NewBuffer(),
pending: d.NewBuffer(),
buf: d.pool.Get().(*bytes.Buffer),
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing some context for this 🤔

Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes here look ok to me. @uhthomas do you wanna address some of my comments? I'd like to do an RC soon and I think it'd be great if we could snuck this change in. Also, this needs a rebase now.

@uhthomas uhthomas force-pushed the 3873 branch 4 times, most recently from de0a06c to 9ed1f97 Compare October 30, 2024 21:32
… size

Some S3 compatible object storage systems like R2 require that all
multipart chunks are the same size. This was mostly true before, except
the final chunk was larger than the requested chunk size which causes
uploads to fail.

In addition, the two byte slices have been replaced with a single
*bytes.Buffer and the surrounding code simplified significantly.

Fixes: distribution#3873

Signed-off-by: Thomas Way <thomas@6f.io>
Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Jamstah Jamstah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks kosher to me, thanks for the additional testing as well.

@milosgajdos milosgajdos merged commit 099201a into distribution:main Nov 5, 2024
17 checks passed
@uhthomas
Copy link
Contributor Author

uhthomas commented Nov 5, 2024

Thank you @milosgajdos, @Jamstah and everyone else who took the time to review and test. Really happy to see this finally get over the line!

wzshiming added a commit to DaoCloud/crproxy that referenced this pull request Feb 10, 2025
wzshiming added a commit to DaoCloud/crproxy that referenced this pull request Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multipart upload issues with Cloudflare R2
6 participants