Proposal: Improve CI runtime #1864

flub · 2023-12-05T16:06:55Z

The build-and-test jobs are taking more than 10 minutes. Here some proposals to speed this up:

Configure sccache on the self-hosted runners:

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.
Reduce the number of steps:

We do not need a separate check step, it achieves nothing. Instead run test straight away.
Split default-features and no-features into separate jobs

Because these jobs are compiling iroh with different feature flags we can parallelise these more by moving the tests to two jobs: test-default-features, test-no-features. Only worth it if we have enough runners. Maybe we should get more runners if that's needed.
Switch to cargo-nextest

The text was updated successfully, but these errors were encountered:

Arqu · 2023-12-05T17:00:42Z

Yeah we've discussed this a couple weeks back.
Think this will need some testing, but correctly configuring sccache should get it to work.
Some caveats to the above proposal:

sccache is not installed on the windows boxes afaik rn (and on the x86_64 mac runner) - I can remedy this
we generally want a system configured sccache for this to work propperly but might be worth just setting the env vars for sccache, might be enough
the check step is necessary cause it does the full matrix of features individually which doesn't suffer from feature unifying when mashing them together (it does add about 3-4 min to the total runtime sadly)
doing cargo nextest alone will not automatically speed things up, but as discussed will allow us to start splitting jobs up further to do them in parallel by testing in chunks
splitting into more jobs brings a little bit of overhead to each job (caches aren't shared) which would be perfectly fine given we have enough nodes running, but we don't so all the parallel jobs will just get scheduled sequentially + overhead for now, we're already maxing out runners and cross and regular builds wait for each other in a single PR
release jobs take a lot longer anyways since they run builds with LTO which adds like 10 min to the whole thing out of the box and you can't split it off further, which hogs runners when merging a lot to main.

Either way I wholly support the cause.

sccache will help regardless (given proper setup)
we need more runners
nextest is a good stepping stone

flub · 2023-12-06T08:56:23Z

Yeah we've discussed this a couple weeks back. Think this will need some testing, but correctly configuring sccache should get it to work. Some caveats to the above proposal:

sccache is not installed on the windows boxes afaik rn (and on the x86_64 mac runner) - I can remedy this

I think we can keep using the action to install it. Maybe I'm wrong but I'll try it.

we generally want a system configured sccache for this to work propperly but might be worth just setting the env vars for sccache, might be enough

Same answer

the check step is necessary cause it does the full matrix of features individually which doesn't suffer from feature unifying when mashing them together (it does add about 3-4 min to the total runtime sadly)

oh, it runs a check job separately for each package in the workspace. Does this really make a difference to the features enabled for each crate on the test runs later?

A summary of all the crates with features and their deps on each other:

iroh

default: cli, metrics
cli: quic-rpc/quinn-transport, tokio/rt-multi-thread, clap, config, console, dirs-next, indicatif, multibase, tracing-subscriber, flat-db, shell-words, shellexpand, rustyline, clolored, toml, human-time, comfy-table, dialoguer, time
metrics: iroh-metrics
flat-db: iroh-bytes/flat-db
test:

iroh-base

default: hash, base32m
hash: bao-tree, multibase, data-encoding, postcard
base32: data-encoding

iroh-bytes

default: flat-db
flat-db: reflink-copy

iroh-gossip

default: net
net: iroh-net, futures, iroh-net, quinn, tokio, tokio-util

iroh-metrics

default: metrics
metrics: prometheus-client

iroh-net

default: metrics
derper: (non-iroh deps)
metrics: iroh-metrics/metrics

iroh-sync

default: net, fs-store, metrics
net: iorh-net, (non-iroh deps)
fs-store: (non-iroh deps)
metrics: iroh-metrics

from this i think the --no-default-features run will not change between doing this per-package or on the workspace.

For default features this is also true, since no default feature on one crate enables a non-default feature in another crate.

Finally reasoning through the --all-features case we also end up with the same deps in each crate when we do it per-crate or for workspace.

So, as far as I understand this would not change things? I had to write this down to get to that conclusion though.

update: I don't think this analysis is necessarily fully correct. It really depends on all the dependencies. And really we should check these combinations automatically. In the meantime I'm not sure how much changing this will affect things. I still suspect we'll catch most issues without having the duplicate cargo check runs, though could be convinced to leave them for now.

doing cargo nextest alone will not automatically speed things up, but as discussed will allow us to start splitting jobs up further to do them in parallel by testing in chunks

Running nextest does speed things up on my local machine, even without any other changes. Unless the runners are single-core machines I expect this to also speed up on CI? Did I get this wrong?

splitting into more jobs brings a little bit of overhead to each job (caches aren't shared) which would be perfectly fine given we have enough nodes running, but we don't so all the parallel jobs will just get scheduled sequentially + overhead for now, we're already maxing out runners and cross and regular builds wait for each other in a single PR

Once we have a working sccache setup the caches for all the dependencies will be shared. Crucially IIUC currently the target directory stomps over itself between the different feature combination runs. By strategically splitting up the steps into jobs so each job sticks to one feature combination we avoid that stomping over each other in the target directory while getting cache benefits of the deps still via sccache.

At least that is my understanding currently.

release jobs take a lot longer anyways since they run builds with LTO which adds like 10 min to the whole thing out of the box and you can't split it off further, which hogs runners when merging a lot to main.

I had not yet looked at release builds, I only looked at improving PR iteration in this issue. But we can totally improve release builds using our learnings from this here if this all works out.

link2xt · 2023-12-06T21:03:41Z

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

Kind of defeating the purpose of sccache which is a Shared Cache. I am currently looking into running a WebDAV storage using nginx for Rust compilation cache for myself (/~https://github.com/mozilla/sccache/blob/main/docs/Webdav.md, https://bazel.build/remote/caching#nginx).
Edit: tried it, does not help much, closed: deltachat/deltachat-core-rust#5090

dignifiedquire · 2023-12-06T22:44:17Z

I still suspect we'll catch most issues without having the duplicate cargo check runs, though could be convinced to leave them for now.

you are welcome to change them for test calls, but we need for each feature and each crate to have an individual execution, there is no way around this. before we had this, I was regularly running into issues when doing releases, because feature unification hid a dependency/inclusion issue.

link2xt · 2023-12-07T03:57:03Z

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/
Main idea is to use Rust local caching with persistent build runners and not try to upload/download the cache anywhere at all.
Reduction from 22.5 minutes to 2.5 minutes looks strange, looks like baseline did not cache dependency builds at all: expressvpn/wolfssl-rs#44
But otherwise the idea of not trying to upload/download Rust cache and simply keeping the target/ folder on the runner makes sense.

fabricedesre · 2023-12-07T04:25:46Z

Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters).

flub · 2023-12-07T09:06:26Z

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/ Main idea is to use Rust local caching with persistent build runners and not try to upload/download the cache anywhere at all. Reduction from 22.5 minutes to 2.5 minutes looks strange, looks like baseline did not cache dependency builds at all: expressvpn/wolfssl-rs#44 But otherwise the idea of not trying to upload/download Rust cache and simply keeping the target/ folder on the runner makes sense.

The approach I suggest here (and implemented in #1865) takes this approach as well: it uses sccache with the (default) local storage. This means once each local runner has seen all our builds, they will all have a local cache of the dependencies available, significantly speeding up future builds. In the PR above this results in compilation time improvements across the board.

This takes a little longer to fill up the local caches than the shared cache like you tried with WebDAV, but if your number of projects is proportionally low to the number of runners this works well I think, and has less overhead than using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you). I think the local cache is a good solution for our current situation. We can improve with webdav or redis once we really need a shared cache.

flub · 2023-12-07T12:35:13Z

Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters).

@fabricedesre does redis force you to have all this in memory though? Or is there a way to run redis without storing it's entire dataset in memory?

fabricedesre · 2023-12-07T16:23:20Z

I'm pretty sure it's not a memory store, we have way more in the Redis cache than the available RAM on the redis instance :)

link2xt · 2023-12-07T21:38:05Z

using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you)

It works and stored 3.3 G on WebDAV already. But sccache does not add anything to /~https://github.com/Swatinem/rust-cache/, i.e. just storing a target/ folder in GitHub actions cache. The only problem is that GHA cache is maximum 10 G. cargo is better at caching than sccache because sccache only caches full compilations of dependencies (i.e. rustc invocations) while cargo has incremental compilation support. If you have a choice between 1) always losing target/ and using sccache for caching 2) using no sccache, but keep target/, second variant is going to win. Earthly simply mount the target/ and .cargo folders into build containers and use no sccache at all.

flub · 2023-12-11T09:43:05Z

using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you)

It works and stored 3.3 G on WebDAV already. But sccache does not add anything to /~https://github.com/Swatinem/rust-cache/, i.e. just storing a target/ folder in GitHub actions cache. The only problem is that GHA cache is maximum 10 G. cargo is better at caching than sccache because sccache only caches full compilations of dependencies (i.e. rustc invocations) while cargo has incremental compilation support. If you have a choice between 1) always losing target/ and using sccache for caching 2) using no sccache, but keep target/, second variant is going to win. Earthly simply mount the target/ and .cargo folders into build containers and use no sccache at all.

makes sense, thanks for the explanation.

I was sort of considering incremental compilation to not be very interesting for CI times, but I guess it could have an impact on the iroh crates itself as PRs rarely touch everything.

I think the main reason sccache tends to be faster than rust-cache is that the latter has to download all the cache files from the last run up-front, including plenty of files that might not be used. While sccache only fetches cache hits.

## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See #1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See #1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This balances the tests accross multiple cores better and speeds up our test run. Closes #1864 ## Notes & open questions This completes everything suggested in #1864. Runtimes can now be further improved by getting more runners. Probably 2 more of each for linux, mac and windows is a good start. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This balances the tests accross multiple cores better and speeds up our test run. Closes n0-computer#1864 ## Notes & open questions This completes everything suggested in n0-computer#1864. Runtimes can now be further improved by getting more runners. Probably 2 more of each for linux, mac and windows is a good start. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See n0-computer/iroh#1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See n0-computer/iroh#1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See n0-computer/iroh#1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See n0-computer/iroh#1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description Building multiple combinations of features in the same jobs means they will stomp over the target directory. This is an approach at avoiding this. Furthermore by splitting this up into more jobs we can get more parallelism by adding more workers. However even without this we already get some improvements due to no longer stomping over the target directory and thanks to sccache helping the various runs. ## Notes & open questions There's an alternative version of this where we change the sequence but don't increase the runners. Without sccache working this would be a step backwards, you can observe this with windows currently. See #1864 for bigger picture. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This enables sccache for the windows self-hosted runners, which are now configured for this. ## Notes & open questions See #1864 ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

## Description This balances the tests accross multiple cores better and speeds up our test run. Closes #1864 ## Notes & open questions This completes everything suggested in #1864. Runtimes can now be further improved by getting more runners. Probably 2 more of each for linux, mac and windows is a good start. ## Change checklist - [x] Self-review. - ~~Documentation updates if relevant.~~ - ~~Tests if relevant.~~

github-project-automation bot added this to iroh Dec 5, 2023

github-project-automation bot moved this to 📋 Backlog in iroh Dec 5, 2023

flub mentioned this issue Dec 7, 2023

chore(ci): split jobs so they do not stomp over target directory #1871

Merged

1 task

flub mentioned this issue Dec 11, 2023

chore(ci): Windows sccache #1873

Merged

1 task

flub mentioned this issue Dec 15, 2023

Use cargo-nextest to run the tests #1867

Merged

1 task

divagant-martian closed this as completed in #1867 Dec 15, 2023

github-project-automation bot moved this from 📋 Backlog to ✅ Done in iroh Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Improve CI runtime #1864

Proposal: Improve CI runtime #1864

flub commented Dec 5, 2023

Arqu commented Dec 5, 2023

flub commented Dec 6, 2023 •

edited

Loading

link2xt commented Dec 6, 2023 •

edited

Loading

dignifiedquire commented Dec 6, 2023

link2xt commented Dec 7, 2023

fabricedesre commented Dec 7, 2023

flub commented Dec 7, 2023

flub commented Dec 7, 2023

fabricedesre commented Dec 7, 2023

link2xt commented Dec 7, 2023 •

edited

Loading

flub commented Dec 11, 2023

Proposal: Improve CI runtime #1864

Proposal: Improve CI runtime #1864

Comments

flub commented Dec 5, 2023

Arqu commented Dec 5, 2023

flub commented Dec 6, 2023 • edited Loading

link2xt commented Dec 6, 2023 • edited Loading

dignifiedquire commented Dec 6, 2023

link2xt commented Dec 7, 2023

fabricedesre commented Dec 7, 2023

flub commented Dec 7, 2023

flub commented Dec 7, 2023

fabricedesre commented Dec 7, 2023

link2xt commented Dec 7, 2023 • edited Loading

flub commented Dec 11, 2023

flub commented Dec 6, 2023 •

edited

Loading

link2xt commented Dec 6, 2023 •

edited

Loading

link2xt commented Dec 7, 2023 •

edited

Loading