Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to 2.1.55 and refactor the build system #246

Merged
merged 28 commits into from
Dec 16, 2022

Conversation

dae
Copy link
Contributor

@dae dae commented Nov 29, 2022

The separate publish workflows have been removed; the "build all" one has the final publishing steps that should only trigger when the workflow is invoked manually.

I've left a few separate commits for reference, but recommend squashing when it comes time to merge in. Notes from the commit:

The 2.1.55 desktop release switched away from Bazel to a new build system,
which required some changes to this repo. Making the changes was a bit complicated
due to the complexity of the current build system, and I ended up shaving some yaks
to make things simpler while I was working on the changes.

Changes:

  • Building for the current architecture instead of all architectures is now the
    default, so getting started is easier, and a build in Android Studio no longer
    requires special flags.
  • The build-current.sh script has been split into build-aar and build-robo
    for the Android and Robolectric parts, and can be used for both single-arch
    and multi-arch builds.
  • On arm64 Macs, the build scripts now create arm binaries
  • In a multi-arch build, both x86 and arm64 Mac libs are built, and they're
    merged into a single library. This can be done in CI, so there is no manual step
    required for M1 machines anymore.
  • The build now uses protobuf and python binaries/libs that the desktop build
    downloads, so they don't need to be installed separately.
  • The pinned Rust version and Rust targets are automatically installed as
    required.
  • The per-platform CI builds now build in debug mode and are faster.
  • Updated the docs to explain how the NDK can be installed via Android Studio,
    instead of via separate command-line tools.
  • The cross/docker stuff has been stripped out, as it's of limited use as it
    can't target macOS legally. Easier to use GH actions for the multi-arch
    builds, and keep things simple for local development.
  • Fix lint not being run in CI; caught an API 23 reference.

Bumps #179 (builds on M1 already work, so this may be simpler than expected?)
Bumps #174 (a bunch of the doctor stuff is obsolete; updated HOWTO.md and GH actions should be consulted)
Bumps #27 (I recommend closing this; single-platform is the default for local builds, and
CI runners don't have any extra compute available)

Tentatively closes #109 (didn't see the flake when I was updating the actions)

Closes #235 (translation submodules now automatically synced with anki submodule)
Closes #213 (path based on script now)
Closes #211 (builds for Arm Mac on Arm Macs)
Closes #197 (single arch is the default)
Closes #196 (desktop venv is used)
Closes #195 (can be done via the GUI, and does not require separate cli download)
Closes #168 (latest Rust; easier changing via rust-toolchain.toml)
Closes #164 (universal dylib)
Closes #127 (no docker)
Closes #106 (most of those scripts obsolete; some commands moved into build scripts)
Closes #99 (no docker)
Closes #98 (no docker)
Closes #97 (submodule automatically updated)
Closes #96 (build will fail if commit unavailable)
Closes #53 (no docker)
Closes #40 (DEBUG=1 option documented)
Closes #9 (simpler OOTB experience, and updated docs)

dae added 4 commits November 27, 2022 23:58
It's not only used to build the bridge, but also to generate TS files,
so top-level makes more sense. It will also fit more nicely into a
cargo workspace.
The 2.1.55 desktop release switched away from Bazel to a new build system,
which required some changes to this repo. Making the changes was a bit complicated
due to the complexity of the current build system, and I ended up shaving some yaks
to make things simpler while I was working on the changes.

Changes:

- Building for the current architecture instead of all architectures is now the
default, so getting started is easier, and a build in Android Studio no longer
requires special flags.
- The build-current.sh script has been split into build-aar and build-robo
for the Android and Robolectric parts, and can be used for both single-arch
and multi-arch builds.
- On arm64 Macs, the build scripts now create arm binaries
- In a multi-arch build, both x86 and arm64 Mac libs are built, and they're
merged into a single library. This can be done in CI, so there is no manual step
required for M1 machines anymore.
- The build now uses protobuf and python binaries/libs that the desktop build
downloads, so they don't need to be installed separately.
- The pinned Rust version and Rust targets are automatically installed as
required.
- The per-platform CI builds now build in debug mode and are faster.
- Updated the docs to explain how the NDK can be installed via Android Studio,
instead of via separate command-line tools.
- The cross/docker stuff has been stripped out, as it's of limited use as it
can't target macOS legally. Easier to use GH actions for the multi-arch
builds, and keep things simple for local development.
- Fix lint not being run in CI; caught an API 23 reference.

Bumps ankidroid#179 (builds on M1 already work, so this may be simpler than expected?)
Bumps ankidroid#174 (a bunch of the doctor stuff is obsolete; updated HOWTO.md and GH actions should be consulted)
Bumps ankidroid#27 (I recommend closing this; single-platform is the default for local builds, and
CI runners don't have any extra compute available)

Tentatively closes ankidroid#109 (didn't see the flake when I was updating the actions)

Closes ankidroid#235 (translation submodules now automatically synced with anki submodule)
Closes ankidroid#213 (path based on script now)
Closes ankidroid#211 (builds for Arm Mac on Arm Macs)
Closes ankidroid#197 (single arch is the default)
Closes ankidroid#196 (desktop venv is used)
Closes ankidroid#195 (can be done via the GUI, and does not require separate cli download)
Closes ankidroid#168 (latest Rust; easier changing via rust-toolchain.toml)
Closes ankidroid#164 (universal dylib)
Closes ankidroid#127 (no docker)
Closes ankidroid#106 (most of those scripts obsolete; some commands moved into build scripts)
Closes ankidroid#99 (no docker)
Closes ankidroid#98 (no docker)
Closes ankidroid#97 (submodule automatically updated)
Closes ankidroid#96 (build will fail if commit unavailable)
Closes ankidroid#53 (no docker)
Closes ankidroid#40 (DEBUG=1 option documented)
Closes ankidroid#9 (simpler OOTB experience, and updated docs)
@dae
Copy link
Contributor Author

dae commented Nov 29, 2022

Copy link
Member

@mikehardy mikehardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazing simplification, that's a shiny smooth yak you've got there! Thank you immensely

Just to confirm, M1 robo builds fine now on a mac x86-64 free runner? If so, that's huge and yes of course, this is fantastic.

Had some thoughts on docs + some guardrails in a couple of the scripts and some other little things but in general, really cool.

Any thoughts on when 2.1.55 has a final commit hash to pin?

Comment on lines 14 to 20
serde = "1.0.114"
serde_json = "1.0.56"
serde_derive = "1.0.114"
lazy_static = "1.4.0"
num_enum = "0.5.0"
itertools = "0.10.0"
lexical-core = "0.7.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if any of the other deps are out of date, what's the command to check ? I can look it up if you don't know immediately off-hand just figured you might

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not precise pins; the exact versions will be in Cargo.lock. cargo update will update to the latest compatible versions, though I don't recommend doing that until the anki submodule is updated again, as the latest chrono will spit out some deprecation warnings that haven't been addressed yet (and thus has been pinned upstream for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up updating the submodule to fix some lints, so you can try this now after I push the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npm/yarn have only existed forever and rust had to invent a semver pinning / updating strategy that is different and almost exactly opposite for specific pins, nice https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#caret-requirements 😆

So no semver specifiers means "go ahead and ingest non-breaking changes, exact version will be in lock", while adding a ^ means "this version in the toml and this version only". ✔️

I'll play with it to see what happens, as I typically want the version in the toml itself to be bumped if a request for "latest version" does find a difference between version in toml and compatible version available. Some javascript / npmjs.com tools/incantations do that and some don't. I find skew between version specifier (in toml) and version used (in lock) to be a source of surprise that is not large but is non-zero leading to my preference there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the rationale for why they effectively reversed tilde and caret. I think the node style is a bit more logical as a caret points up, but that ship has sailed :-)

You can 'cargo install cargo-edit' and then use 'cargo upgrade' to update the .toml file in addition to the lock file, though unless the versions are tilde-pinned, it doesn't actually mean that older versions won't be allowed. If the lock file is deleted you'll end up with the latest version anyway; this mainly applies to niche cases where another crate in the same repo has pinned an exact older version for example. If one crate depends on "1.2.3" and another on "=1.2.2", a tilde pin would not allow them to compile together, whereas the default/caret style does. I'm guessing both versions could co-exist in a node project, but it's not possible to have multiple libraries with the same semver in Rust as they'd cause conflicts at link time.

@dae
Copy link
Contributor Author

dae commented Nov 30, 2022

Any thoughts on when 2.1.55 has a final commit hash to pin?

Hopefully within a few weeks - there are a few issues with the webpages that need working through still

@dae
Copy link
Contributor Author

dae commented Nov 30, 2022

(in case you weren't already aware: /~https://github.com/ankidroid/Anki-Android-Backend/actions/runs/3580634081)

@dae dae force-pushed the 2.1.55 branch 2 times, most recently from dcc8097 to 1541a29 Compare November 30, 2022 05:44
dae added 5 commits November 30, 2022 16:01
Opted to match the actual variable name rather than make it human
readable; the print statements are just there to help debug issues
when the user forgets to shut down gradle first.
Copy link
Member

@mikehardy mikehardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks for all the clarification.

I wasn't aware of the workflow action not being in the allowed actions list on the org because I always forget that. I added the ninja action now and wanted to run the workflows but remembered (before I ran them!) that it would trigger a publish.

I proposed an optional string for manual dispatch that will trigger publish, based on a tested/working implementation I did for cache clearing in the Anki-Android repo, what do you think? (it's a string not a boolean as an implementation detail, I found testing booleans in the workflow steps on the if to be fussy vs just testing a string). Now I realize github finally added manual cache deletion so maybe it's obsolete on the other workflow but it serves as an example here so that every manual run doesn't have the serious side effect of a publish

What do you think?

Also a question on custom rslib loading vs local build testing on the Anki-Android side

@mikehardy
Copy link
Member

Other than the windows failure due to a unix-ism below this appears to be mostly environment setup / validation / DX ergonomics type stuff

linux

✔️ This passed testing on ubuntu 22

macOS

❌ I am checking on macOS with ALL_ARCHS=1 ./build-aar.sh and I get:

note: /Users/mike/work/AnkiDroid/Anki-Android-Backend/build/linker-wrapper/linker-wrapper.sh: line 4: python: command not found

Is there some way to force it to use python3? python no longer exists in base macOS distribution and is sort of ambiguous (changes from 2 to 3 have caused breaks all over in other projects...).

I was able to workaround it with a prefix (shared here https://stackoverflow.com/a/74630547/9910298) so for me this worked:

RUST_ANDROID_GRADLE_PYTHON_COMMAND=python3 ALL_ARCHS=1 ./build-aar.sh

❌ When I went to run ALL_ARCHS=1 ./build-robo.sh I got an error for missing cross-compiler:

error occurred: Failed to find tool. Is x86_64-unknown-linux-gnu-gcc installed?

On my local system, this had not been performed (ripped from build all workflow yml):

          brew tap SergioBenitez/osxct
          brew install x86_64-unknown-linux-gnu
          x86_64-unknown-linux-gnu-gcc -v

This is why we have the ./tools/doctor.sh script I suppose ;-)

Perhaps build-robo could attempt to execute that with an error message indicating the missing thing if x86_64-unknown-linux-gnu-gcc -v 2>&1 > /dev/null had non-zero exit?

❌ When I tried to run ./tools/doctor.sh it now contains some obsolete things, it does not seem to have been altered to match this PR (yet another hairy yak...) as it talks about docker and references a deleted docker script

❌ At that point ALL_ARCHS=1 ./build-robo.sh failed on missing mingw, covered here in yaml:

brew install mingw-w64 && x86_64-w64-mingw32-gcc -v

Could probably have a similar "is it actually there?" check + failure (with these checks maybe killing the need for doctor, which would be nice since it would reduce duplication...)

✔️ Now ALL_ARCHS=1 ./build-robo.sh seemed to work (though I took a temporary DNS resolution failure to crates.io even 💩 , seriously, what a pain testing all this stuff 😅 )

❌ gradle target :rsdroid:lint seems to have a silent failure here, could be peeled to a separate multi-issue (fix silent failures passing + fix failure) or could shave another yak here to add to the herd

> Task :rsdroid:lint
Could not load custom lint check jar file /Users/mike/.gradle/caches/transforms-3/40bf2c11c8f0fb09cac3b5552b0978fc/transformed/timber-5.0.1/jars/lint.jar
java.lang.NoClassDefFoundError: com/android/tools/lint/client/api/Vendor

❌ took what looks like a flake failure, my machine was very heavily loaded and this popped out from gradle target :rsdroid-instrumented:connectedDebugAndroidTest:

net.ankiweb.rsdroid.BackendIntegrationTests > collectionIsVersion11AfterOpen[TestingAVD(AVD) - 12] FAILED 
        org.junit.runners.model.TestTimedOutException: test timed out after 3 seconds
        at com.google.protobuf.FieldSet.newFieldSet(FieldSet.java:99)

Maybe alter this to be 6 since it is arbitrary anyway?

Must be rare, hasn't happened before, but no need to fail it so fast just because I've got a billion different projects running tests at once

✔️ ./check-droid.sh and ./check-rust.sh appear to work fine at this point on macOS

windows

❌ I required this in order to run things:

Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope Process

...as it would complain about the chained scripts as it called in to build web assets. I guess that could be in a doctor.bat file or perhaps build-aar.bat could check it somehow? Or just run the thing (it does not require special privileges to alter execution policy for that scope, and it's temporary + fairly focused so it does not seem like a big error from "increasing security attack surface area" perspective. Famous last words)

❌ At that point build-aar.bat was running along and failed on this ninja step (appears to be 13/24):

FAILED: out/ts/lib/backend_proto.js out/ts/lib/backend_proto.d.ts

Error message contained the tell-tale unix-ism rm out/ts/lib/backend_proto_static.js instead of something I'd expect on windows like del

Could be this? /~https://github.com/ankitects/anki/blob/5e0a761b875fff4c9e4b202c08bd740c7bb37763/build/ninja_gen/src/node.rs#L299

I'm stuck on windows until that's resolved

@dae

This comment was marked as resolved.

@mikehardy
Copy link
Member

Ooo - it silently failed this time when I went to run it again from PowerShell after running the Set-ExecutionPolicy. So there is some more silent failure going on - it was on the same rm line as I have not done msys2 yet (I wanted to retry it first - glad I did - silent failures are a problem). You might want to spin up windows VM and eat the dog food now and again :-). I don't have any "real" windows machine around but a VM with community Visual Code installed + choco etc comes in handy.

@dae

This comment was marked as outdated.

@dae
Copy link
Contributor Author

dae commented Dec 2, 2022

Ok, I see what you mean. Sigh, setting up AS.

@mikehardy
Copy link
Member

Sorry for the pain, that's a bunch of sideways motion, but hopefully it helps the upstream stuff for you? Just to be clear + positive: this PR is amazing and I don't want this 1%-level fiddling to diminish that at all. Such a huge huge simplification here, thank you

@dae
Copy link
Contributor Author

dae commented Dec 2, 2022

Does it work correctly for you now?

dae added 3 commits December 2, 2022 13:34
Avoids the need to change the execution policy, and the new script
checks execution success.
Avoids the need to install it separately
@@ -65,7 +65,7 @@ jobs:
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-rust-release-v2-${{ hashFiles('rslib-bridge/**/Cargo.lock') }}
key: ${{ runner.os }}-rust-release-v2-${{ hashFiles('Cargo.lock', 'rslib-bridge/**', 'anki/rslib/**', 'anki/ftl/**', 'anki/proto/**') }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to hash every single file here instead of just the Cargo.lock files? That seems really expensive - it's already pretty expensive just doing the file scanning (in my experience it can add up to quite a bit of CI time). Best perhaps to focus these on just known Cargo.lock files that conclusively imply something has changed and warrants a cache upload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference. The lockfiles should be sufficient of the majority of the cached content; changes to the other files only affect the compilation output of our first-party crates.

Copy link
Member

@mikehardy mikehardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and working on windows now! Excellent.

Made some display suggestions on README I noticed when reading the file full/separate to see it render. They are trivial

Down to just the question on cache hashing all files vs just lock files and the also-trivial extra expansion on test file timeout

This is really really close to going.

Should we just merge this and do the anki sha bump later? As long as we don't force a publish with manual workflow run + optional argument, I think it is safe to have it merged. And it clears the decks on this one

@dae
Copy link
Contributor Author

dae commented Dec 3, 2022

No objections to merging earlier, though currently the anki submodule points to a commit that's sitting in an unmerged PR on the ankidroid/anki repo. Come release time that branch will be rebased, and I'm not sure if github keep the old commit indefinitely or not. If they don't, it's possible the merged changes will fail to build in the future if the pending PR is not merged into a branch/tag.

@mikehardy
Copy link
Member

Ah! I thought it was pointing to main but just wasn't where you wanted it yet. If it's pointing to a branch better to wait for reasons you state (commit hashes might move around)

Copy link
Member

@mikehardy mikehardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked this thing over fully, that's for sure. Looks great, just waiting on a stable anki submodule sha then. What a great change to get rid of all the Docker / cross stuff, wow. It was good while it lasted but now that M1 has fully permeated the ecosystem + going with most people just wanting to build their arch, this is so much better.

@dae dae marked this pull request as ready for review December 16, 2022 11:05
@dae
Copy link
Contributor Author

dae commented Dec 16, 2022

Ok, good to go. ankidroid/anki#6 should be pulled first. Recommend a squash merge for this one, then triggering the release workflow.

@mikehardy
Copy link
Member

Fantastic! That was a pretty big changelog on the anki side (I read through almost all of it and ran out of stamina!)
Thanks again for working it through here, I'll see if I can't grab the baton and take it the rest of the way

@mikehardy
Copy link
Member

"all" workflow running now via manual dispatch with string "shipit" (:shipit: !) it takes forever so there might be a delay but I'll re-check at a later date to finalize the maven repo and everything. Hopefully it works...

@mikehardy
Copy link
Member

closed + released, maven's got the new artifact, 🏃‍♂️ is what I look like as I mosey on over to the ankidroid side to ingest the artifact and get this out. Vamos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment