-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Improve CI runtime #1864
Comments
Yeah we've discussed this a couple weeks back.
Either way I wholly support the cause.
|
I think we can keep using the action to install it. Maybe I'm wrong but I'll try it.
Same answer
oh, it runs a check job separately for each package in the workspace. Does this really make a difference to the features enabled for each crate on the test runs later? A summary of all the crates with features and their deps on each other: iroh
iroh-base
iroh-bytes
iroh-gossip
iroh-metrics
iroh-net
iroh-sync
from this i think the For Finally reasoning through the So, as far as I understand this would not change things? I had to write this down to get to that conclusion though. update: I don't think this analysis is necessarily fully correct. It really depends on all the dependencies. And really we should check these combinations automatically. In the meantime I'm not sure how much changing this will affect things. I still suspect we'll catch most issues without having the duplicate
Running nextest does speed things up on my local machine, even without any other changes. Unless the runners are single-core machines I expect this to also speed up on CI? Did I get this wrong?
Once we have a working sccache setup the caches for all the dependencies will be shared. Crucially IIUC currently the target directory stomps over itself between the different feature combination runs. By strategically splitting up the steps into jobs so each job sticks to one feature combination we avoid that stomping over each other in the target directory while getting cache benefits of the deps still via sccache. At least that is my understanding currently.
I had not yet looked at release builds, I only looked at improving PR iteration in this issue. But we can totally improve release builds using our learnings from this here if this all works out. |
Kind of defeating the purpose of sccache which is a Shared Cache. I am currently looking into running a WebDAV storage using nginx for Rust compilation cache for myself (/~https://github.com/mozilla/sccache/blob/main/docs/Webdav.md, https://bazel.build/remote/caching#nginx). |
you are welcome to change them for |
There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/ |
Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters). |
The approach I suggest here (and implemented in #1865) takes this approach as well: it uses sccache with the (default) local storage. This means once each local runner has seen all our builds, they will all have a local cache of the dependencies available, significantly speeding up future builds. In the PR above this results in compilation time improvements across the board. This takes a little longer to fill up the local caches than the shared cache like you tried with WebDAV, but if your number of projects is proportionally low to the number of runners this works well I think, and has less overhead than using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you). I think the local cache is a good solution for our current situation. We can improve with webdav or redis once we really need a shared cache. |
@fabricedesre does redis force you to have all this in memory though? Or is there a way to run redis without storing it's entire dataset in memory? |
I'm pretty sure it's not a memory store, we have way more in the Redis cache than the available RAM on the redis instance :) |
It works and stored 3.3 G on WebDAV already. But sccache does not add anything to /~https://github.com/Swatinem/rust-cache/, i.e. just storing a |
makes sense, thanks for the explanation. I was sort of considering incremental compilation to not be very interesting for CI times, but I guess it could have an impact on the iroh crates itself as PRs rarely touch everything. I think the main reason sccache tends to be faster than rust-cache is that the latter has to download all the cache files from the last run up-front, including plenty of files that might not be used. While sccache only fetches cache hits. |
## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See #1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See #1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This balances the tests accross multiple cores better and speeds up our test run. Closes #1864 ## Notes & open questions This completes everything suggested in #1864. Runtimes can now be further improved by getting more runners. Probably 2 more of each for linux, mac and windows is a good start. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This balances the tests accross multiple cores better and speeds up our test run. Closes n0-computer#1864 ## Notes & open questions This completes everything suggested in n0-computer#1864. Runtimes can now be further improved by getting more runners. Probably 2 more of each for linux, mac and windows is a good start. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See n0-computer/iroh#1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See n0-computer/iroh#1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See n0-computer/iroh#1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See n0-computer/iroh#1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See #1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See #1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
## Description This balances the tests accross multiple cores better and speeds up our test run. Closes #1864 ## Notes & open questions This completes everything suggested in #1864. Runtimes can now be further improved by getting more runners. Probably 2 more of each for linux, mac and windows is a good start. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~
The build-and-test jobs are taking more than 10 minutes. Here some proposals to speed this up:
Configure sccache on the self-hosted runners:
We run on self-hosted runners but use
SCCACHE_GHA_ENABLED
. We should instead set aSCCACHE_CACHE_SIZE=80G
and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.Reduce the number of steps:
We do not need a separate
check
step, it achieves nothing. Instead runtest
straight away.Split default-features and no-features into separate jobs
Because these jobs are compiling iroh with different feature flags we can parallelise these more by moving the tests to two jobs:
test-default-features
,test-no-features
. Only worth it if we have enough runners. Maybe we should get more runners if that's needed.Switch to cargo-nextest
The text was updated successfully, but these errors were encountered: