Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Improve CI runtime #1864

Closed
flub opened this issue Dec 5, 2023 · 11 comments · Fixed by #1867
Closed

Proposal: Improve CI runtime #1864

flub opened this issue Dec 5, 2023 · 11 comments · Fixed by #1867

Comments

@flub
Copy link
Contributor

flub commented Dec 5, 2023

The build-and-test jobs are taking more than 10 minutes. Here some proposals to speed this up:

  • Configure sccache on the self-hosted runners:

    We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

  • Reduce the number of steps:

    We do not need a separate check step, it achieves nothing. Instead run test straight away.

  • Split default-features and no-features into separate jobs

    Because these jobs are compiling iroh with different feature flags we can parallelise these more by moving the tests to two jobs: test-default-features, test-no-features. Only worth it if we have enough runners. Maybe we should get more runners if that's needed.

  • Switch to cargo-nextest

@github-project-automation github-project-automation bot moved this to 📋 Backlog in iroh Dec 5, 2023
@Arqu
Copy link
Collaborator

Arqu commented Dec 5, 2023

Yeah we've discussed this a couple weeks back.
Think this will need some testing, but correctly configuring sccache should get it to work.
Some caveats to the above proposal:

  • sccache is not installed on the windows boxes afaik rn (and on the x86_64 mac runner) - I can remedy this
  • we generally want a system configured sccache for this to work propperly but might be worth just setting the env vars for sccache, might be enough
  • the check step is necessary cause it does the full matrix of features individually which doesn't suffer from feature unifying when mashing them together (it does add about 3-4 min to the total runtime sadly)
  • doing cargo nextest alone will not automatically speed things up, but as discussed will allow us to start splitting jobs up further to do them in parallel by testing in chunks
  • splitting into more jobs brings a little bit of overhead to each job (caches aren't shared) which would be perfectly fine given we have enough nodes running, but we don't so all the parallel jobs will just get scheduled sequentially + overhead for now, we're already maxing out runners and cross and regular builds wait for each other in a single PR
  • release jobs take a lot longer anyways since they run builds with LTO which adds like 10 min to the whole thing out of the box and you can't split it off further, which hogs runners when merging a lot to main.

Either way I wholly support the cause.

  • sccache will help regardless (given proper setup)
  • we need more runners
  • nextest is a good stepping stone

@flub
Copy link
Contributor Author

flub commented Dec 6, 2023

Yeah we've discussed this a couple weeks back. Think this will need some testing, but correctly configuring sccache should get it to work. Some caveats to the above proposal:

sccache is not installed on the windows boxes afaik rn (and on the x86_64 mac runner) - I can remedy this

I think we can keep using the action to install it. Maybe I'm wrong but I'll try it.

we generally want a system configured sccache for this to work propperly but might be worth just setting the env vars for sccache, might be enough

Same answer

the check step is necessary cause it does the full matrix of features individually which doesn't suffer from feature unifying when mashing them together (it does add about 3-4 min to the total runtime sadly)

oh, it runs a check job separately for each package in the workspace. Does this really make a difference to the features enabled for each crate on the test runs later?

A summary of all the crates with features and their deps on each other:

iroh

  • default: cli, metrics
  • cli: quic-rpc/quinn-transport, tokio/rt-multi-thread, clap, config, console, dirs-next, indicatif, multibase, tracing-subscriber, flat-db, shell-words, shellexpand, rustyline, clolored, toml, human-time, comfy-table, dialoguer, time
  • metrics: iroh-metrics
  • flat-db: iroh-bytes/flat-db
  • test:

iroh-base

  • default: hash, base32m
  • hash: bao-tree, multibase, data-encoding, postcard
  • base32: data-encoding

iroh-bytes

  • default: flat-db
  • flat-db: reflink-copy

iroh-gossip

  • default: net
  • net: iroh-net, futures, iroh-net, quinn, tokio, tokio-util

iroh-metrics

  • default: metrics
  • metrics: prometheus-client

iroh-net

  • default: metrics
  • derper: (non-iroh deps)
  • metrics: iroh-metrics/metrics

iroh-sync

  • default: net, fs-store, metrics
  • net: iorh-net, (non-iroh deps)
  • fs-store: (non-iroh deps)
  • metrics: iroh-metrics

from this i think the --no-default-features run will not change between doing this per-package or on the workspace.

For default features this is also true, since no default feature on one crate enables a non-default feature in another crate.

Finally reasoning through the --all-features case we also end up with the same deps in each crate when we do it per-crate or for workspace.

So, as far as I understand this would not change things? I had to write this down to get to that conclusion though.

update: I don't think this analysis is necessarily fully correct. It really depends on all the dependencies. And really we should check these combinations automatically. In the meantime I'm not sure how much changing this will affect things. I still suspect we'll catch most issues without having the duplicate cargo check runs, though could be convinced to leave them for now.

doing cargo nextest alone will not automatically speed things up, but as discussed will allow us to start splitting jobs up further to do them in parallel by testing in chunks

Running nextest does speed things up on my local machine, even without any other changes. Unless the runners are single-core machines I expect this to also speed up on CI? Did I get this wrong?

splitting into more jobs brings a little bit of overhead to each job (caches aren't shared) which would be perfectly fine given we have enough nodes running, but we don't so all the parallel jobs will just get scheduled sequentially + overhead for now, we're already maxing out runners and cross and regular builds wait for each other in a single PR

Once we have a working sccache setup the caches for all the dependencies will be shared. Crucially IIUC currently the target directory stomps over itself between the different feature combination runs. By strategically splitting up the steps into jobs so each job sticks to one feature combination we avoid that stomping over each other in the target directory while getting cache benefits of the deps still via sccache.

At least that is my understanding currently.

release jobs take a lot longer anyways since they run builds with LTO which adds like 10 min to the whole thing out of the box and you can't split it off further, which hogs runners when merging a lot to main.

I had not yet looked at release builds, I only looked at improving PR iteration in this issue. But we can totally improve release builds using our learnings from this here if this all works out.

@link2xt
Copy link
Contributor

link2xt commented Dec 6, 2023

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

Kind of defeating the purpose of sccache which is a Shared Cache. I am currently looking into running a WebDAV storage using nginx for Rust compilation cache for myself (/~https://github.com/mozilla/sccache/blob/main/docs/Webdav.md, https://bazel.build/remote/caching#nginx).
Edit: tried it, does not help much, closed: deltachat/deltachat-core-rust#5090

@dignifiedquire
Copy link
Contributor

I still suspect we'll catch most issues without having the duplicate cargo check runs, though could be convinced to leave them for now.

you are welcome to change them for test calls, but we need for each feature and each crate to have an individual execution, there is no way around this. before we had this, I was regularly running into issues when doing releases, because feature unification hid a dependency/inclusion issue.

@link2xt
Copy link
Contributor

link2xt commented Dec 7, 2023

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/
Main idea is to use Rust local caching with persistent build runners and not try to upload/download the cache anywhere at all.
Reduction from 22.5 minutes to 2.5 minutes looks strange, looks like baseline did not cache dependency builds at all: expressvpn/wolfssl-rs#44
But otherwise the idea of not trying to upload/download Rust cache and simply keeping the target/ folder on the runner makes sense.

@fabricedesre
Copy link
Contributor

Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters).

@flub
Copy link
Contributor Author

flub commented Dec 7, 2023

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/ Main idea is to use Rust local caching with persistent build runners and not try to upload/download the cache anywhere at all. Reduction from 22.5 minutes to 2.5 minutes looks strange, looks like baseline did not cache dependency builds at all: expressvpn/wolfssl-rs#44 But otherwise the idea of not trying to upload/download Rust cache and simply keeping the target/ folder on the runner makes sense.

The approach I suggest here (and implemented in #1865) takes this approach as well: it uses sccache with the (default) local storage. This means once each local runner has seen all our builds, they will all have a local cache of the dependencies available, significantly speeding up future builds. In the PR above this results in compilation time improvements across the board.

This takes a little longer to fill up the local caches than the shared cache like you tried with WebDAV, but if your number of projects is proportionally low to the number of runners this works well I think, and has less overhead than using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you). I think the local cache is a good solution for our current situation. We can improve with webdav or redis once we really need a shared cache.

@flub
Copy link
Contributor Author

flub commented Dec 7, 2023

Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters).

@fabricedesre does redis force you to have all this in memory though? Or is there a way to run redis without storing it's entire dataset in memory?

@fabricedesre
Copy link
Contributor

I'm pretty sure it's not a memory store, we have way more in the Redis cache than the available RAM on the redis instance :)

@link2xt
Copy link
Contributor

link2xt commented Dec 7, 2023

using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you)

It works and stored 3.3 G on WebDAV already. But sccache does not add anything to /~https://github.com/Swatinem/rust-cache/, i.e. just storing a target/ folder in GitHub actions cache. The only problem is that GHA cache is maximum 10 G. cargo is better at caching than sccache because sccache only caches full compilations of dependencies (i.e. rustc invocations) while cargo has incremental compilation support. If you have a choice between 1) always losing target/ and using sccache for caching 2) using no sccache, but keep target/, second variant is going to win. Earthly simply mount the target/ and .cargo folders into build containers and use no sccache at all.

@flub
Copy link
Contributor Author

flub commented Dec 11, 2023

using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you)

It works and stored 3.3 G on WebDAV already. But sccache does not add anything to /~https://github.com/Swatinem/rust-cache/, i.e. just storing a target/ folder in GitHub actions cache. The only problem is that GHA cache is maximum 10 G. cargo is better at caching than sccache because sccache only caches full compilations of dependencies (i.e. rustc invocations) while cargo has incremental compilation support. If you have a choice between 1) always losing target/ and using sccache for caching 2) using no sccache, but keep target/, second variant is going to win. Earthly simply mount the target/ and .cargo folders into build containers and use no sccache at all.

makes sense, thanks for the explanation.

I was sort of considering incremental compilation to not be very interesting for CI times, but I guess it could have an impact on the iroh crates itself as PRs rarely touch everything.

I think the main reason sccache tends to be faster than rust-cache is that the latter has to download all the cache files from the last run up-front, including plenty of files that might not be used. While sccache only fetches cache hits.

Arqu pushed a commit that referenced this issue Dec 11, 2023
## Description

Building multiple combinations of features in the same jobs means they
will stomp over the target directory. This is an approach at avoiding
this.

Furthermore by splitting this up into more jobs we can get more
parallelism
by adding more workers.  However even without this we already get some
improvements due to no longer stomping over the target directory and
thanks
to sccache helping the various runs.

## Notes & open questions

There's an alternative version of this where we change the sequence but
don't increase
the runners.

Without sccache working this would be a step backwards, you can observe
this
with windows currently.

See #1864 for bigger picture.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
@flub flub mentioned this issue Dec 11, 2023
1 task
github-merge-queue bot pushed a commit that referenced this issue Dec 11, 2023
## Description

This enables sccache for the windows self-hosted runners, which are
now configured for this.

## Notes & open questions

See #1864

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
github-merge-queue bot pushed a commit that referenced this issue Dec 15, 2023
## Description

This balances the tests accross multiple cores better and speeds up our
test run.

Closes #1864 

## Notes & open questions

This completes everything suggested in #1864.  Runtimes can now be
further improved by getting more runners.  Probably 2 more of each
for linux, mac and windows is a good start.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in iroh Dec 15, 2023
fubuloubu pushed a commit to ApeWorX/iroh that referenced this issue Feb 21, 2024
## Description

This balances the tests accross multiple cores better and speeds up our
test run.

Closes n0-computer#1864 

## Notes & open questions

This completes everything suggested in n0-computer#1864.  Runtimes can now be
further improved by getting more runners.  Probably 2 more of each
for linux, mac and windows is a good start.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
rklaehn pushed a commit to n0-computer/iroh-blobs that referenced this issue Oct 22, 2024
## Description

Building multiple combinations of features in the same jobs means they
will stomp over the target directory. This is an approach at avoiding
this.

Furthermore by splitting this up into more jobs we can get more
parallelism
by adding more workers.  However even without this we already get some
improvements due to no longer stomping over the target directory and
thanks
to sccache helping the various runs.

## Notes & open questions

There's an alternative version of this where we change the sequence but
don't increase
the runners.

Without sccache working this would be a step backwards, you can observe
this
with windows currently.

See n0-computer/iroh#1864 for bigger picture.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
rklaehn pushed a commit to n0-computer/iroh-blobs that referenced this issue Oct 22, 2024
## Description

This enables sccache for the windows self-hosted runners, which are
now configured for this.

## Notes & open questions

See n0-computer/iroh#1864

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
matheus23 pushed a commit to n0-computer/iroh-doctor that referenced this issue Oct 22, 2024
## Description

Building multiple combinations of features in the same jobs means they
will stomp over the target directory. This is an approach at avoiding
this.

Furthermore by splitting this up into more jobs we can get more
parallelism
by adding more workers.  However even without this we already get some
improvements due to no longer stomping over the target directory and
thanks
to sccache helping the various runs.

## Notes & open questions

There's an alternative version of this where we change the sequence but
don't increase
the runners.

Without sccache working this would be a step backwards, you can observe
this
with windows currently.

See n0-computer/iroh#1864 for bigger picture.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
matheus23 pushed a commit to n0-computer/iroh-doctor that referenced this issue Oct 22, 2024
## Description

This enables sccache for the windows self-hosted runners, which are
now configured for this.

## Notes & open questions

See n0-computer/iroh#1864

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
matheus23 pushed a commit that referenced this issue Nov 14, 2024
## Description

Building multiple combinations of features in the same jobs means they
will stomp over the target directory. This is an approach at avoiding
this.

Furthermore by splitting this up into more jobs we can get more
parallelism
by adding more workers.  However even without this we already get some
improvements due to no longer stomping over the target directory and
thanks
to sccache helping the various runs.

## Notes & open questions

There's an alternative version of this where we change the sequence but
don't increase
the runners.

Without sccache working this would be a step backwards, you can observe
this
with windows currently.

See #1864 for bigger picture.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
matheus23 pushed a commit that referenced this issue Nov 14, 2024
## Description

This enables sccache for the windows self-hosted runners, which are
now configured for this.

## Notes & open questions

See #1864

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
matheus23 pushed a commit that referenced this issue Nov 14, 2024
## Description

This balances the tests accross multiple cores better and speeds up our
test run.

Closes #1864 

## Notes & open questions

This completes everything suggested in #1864.  Runtimes can now be
further improved by getting more runners.  Probably 2 more of each
for linux, mac and windows is a good start.

## Change checklist

- [x] Self-review.
- ~~Documentation updates if relevant.~~
- ~~Tests if relevant.~~
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants