-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Cache is valid if checkout or contents hasn't changed #10507
Conversation
r? @ehuss (rust-highfive has picked a reviewer for you, use r? to override) |
cc @arlosi. Thanks for the set up work. |
393f15a
to
f2a1a0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not miss any discussion. This was an independent thought I had when I looked at this code for the first time after reviewing #10482. I'm not sure how much context to explain. If I end up with two little, I'm sorry, I can add more. Before #10482 the entire git checkout had one version number. We used for that version number the hash of the commit at HEAD. Once we read the data out of git we saved it too disc in a slightly optimized format. To make sure we didn't read an optimized file that was now out of date, the file started with the version number. If we fetched a new HEAD commit then we would overwrite all cache files. The big change from #10482 is to think of each file as having its own version number. This will be critical for HTTP based indexes, but was not intended to make a change for git based indexes. To maintain the same behavior, no matter what file is being looked at it's version number is "hash of the commit at HEAD". Thus it has basically the same performance, and maintains the same state on disk. The idea of this PR is that the file will be considered up to date if either it ends with "hash of the commit at HEAD" or starts with a "hash of the files contents". When enough time has passed we can start writing files with both hashes, and get cash hits more often. But we will continue to use files on disk that were created by older Cargos (like stable) that only have the commit hash. |
Wow, thanks! What I guessed is very close to your detailed explanation 😆 There are some places I believe we can improve:
|
Co-authored-by: Weihang Lo <weihanglo@users.noreply.github.com>
c9f2d66
to
d050865
Compare
Both excellent points! The code has been corrected, and a comment has been added. |
d050865
to
8058c3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks really nice! Thank you!
@bors r+ |
📌 Commit 8058c3d has been approved by |
☀️ Test successful - checks-actions |
Update cargo 5 commits in 1ef1e0a12723ce9548d7da2b63119de9002bead8..e2e2dddebe66dfc1403a312653557e332445308b 2022-03-31 00:17:18 +0000 to 2022-04-05 17:04:53 +0000 - Part 2 of RFC2906 -- allow inheriting from a different `Cargo.toml` (rust-lang/cargo#10517) - File Cache is valid if checkout or contents hasn't changed (rust-lang/cargo#10507) - Fix how scrape-examples handles proc macros (rust-lang/cargo#10533) - tools: update checkout action on CI (rust-lang/cargo#10521) - Don't error if no binaries were installed (rust-lang/cargo#10508)
Cache index files based on contents hash Since #10507 Cargo has known how to read registry cached files whose index version starts with the hash of the file contents. Git makes it very cheap to determine the hash of a file. This PR switches cargo to start writing the new format. Cargoes from before #10507 will not know how to read, and therefore overwrite, cached files written by Cargos after this PR. Cargos after this PR can still read, and will consider up-to-date cached files written by all older Cargos. As I'm writing this out I'm thinking that there may not be any point in writing a file that has both. An alternative implementation just writes the file contents hash. 🤔
#10482 allow the registry to have cashe keys at a per file level.
On master if the currently checked out commit changes all cached files are regenerated.
After this commit we also check if GITs hash of the particular file has changed.
To avoid thrashing cached files this PR only checks for the presence of both hashes, but does not write them both. This means that nightly after this branch will still be able to share file caches with older versions, specifically current stable.
When sufficient versions of stable know how to read the new format, a one line change can be made to start writing the new format.