-
-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[commitgraph] Implement commit-graph-verify plumbing command. #26
Conversation
Thanks a lot! This is indeed a nice increment, and I will take a closer look soon. edit: just saw it's a draft. Will take a look when ready. Happy coding! |
8db5e51
to
c418871
Compare
Commit graphs are optional, so commit-graph plumbing is guarded by a `commitgraph` feature that is enabled by default.
It doesn't look like `quick_error` supports type parameters on error enums, and I want to use type parameters for verification errors.
This constructor tries to accept as many commit-graph-looking paths as possible, such as `.git/objects/info`, `.git/objects/info/commit-graph`, or `.git/objects/info/commit-graphs`.
* It doesn't use evil `as`. * It's a bit less code. * It should be a tiny bit faster. * It is slightly more strict in that it will panic if the base graphs list is not evenly divisible into hashes. The file parser's error checking should ensure that this panic never happens, but maybe maybe maybe...
c418871
to
8dad3ff
Compare
Completed the non-parallel, no-progress implementation of I think the most iffy part of this is I'm also unsure how closely you want to match git in terms of verification error messages and test cases. |
That sounds like great progress! I wonder how fast verification already is. After all, I wouldn't expect the graph files to be too large, to the point where single-threaded operation might already be fast enough for most. After all, using git-features::parallel will drive the code complexity up a notch. The same applies to adding progress, to some extend, even though it's usually much easier to accomplish. It might be a waste though if the average verify operation takes less than 1s, the time the 'line renderer' will wait until it shows up at all. Edit: I just noticed that git also looks up the actual commit object for additional validation. This is something the
I took a look and understand why that is necessary. Something leaving a bitter taste is the asymmetry this causes in the API. One could (probably) argue that A somewhat less complex way of restoring symmetry could be to put the To me that doesn't sound too bad, at least if a symmetric API is considered valuable, and is my preferred solution. Please let me know if I am missing something though, as what's there is definitely doing the job well enough already.
This is entirely up to you. When it comes to test-cases, I tend to be very pragmatic and go with a happy-path test for every feature (or command line flag), along with at least one sad-path test to see that verification actually works while producing acceptable error messages, without the need to test for every variant in the error enum at all. Thus git's tests are certainly more exhaustive, but there is value to keep it simple and lean at the beginning to help moving fast, at the cost of being reactive in case bugs inevitably come in with tests for their fixes only. Great work thus far, thanks so much! I am super interested in seeing this in a stress test, and of course to compare its performance with git, which I think is single-threaded too. |
Here are some quick measurements on the most pathological repo I could find (/~https://github.com/cirosantilli/test-many-commits-1m):
I'll revisit the Edit: If we go with the Can |
This looks very much like progress should be implemented then. It's quite swift given that it probably makes 1 million object lookups… if that's true, it would do 71k lookups per second on a single thread, which seems like a lot judging from my experience with git's performance with pack lookups. Alternatively your CPU is ~2.5x of what I am using (also not too unlikely :D). The other timings will also be interesting especially once we have a similar command.
That's true, it looks like the nesting level gets a little deeper than I thought. I hoped it would look like this when matched: match err {
graph::verify::Error::File(file::verify::Error::Processor(err)) => {case 1: I thought it would be this…},
graph::verify::Error::File(file::verify::Error::Processor(graph::verify::ProcessorError::Processor(err))) => {case 2: …and agree this seems a bit unwieldy, despite being lossless and correct},
_ => {}
} Case 1 seems quite alright still, even though case 2 might be a bit too much, especially if you see a way to sidestep the issue.
That's an interesting one! Thus far What about putting the 'detailed verification' behind a feature flag in let db = git_odb::compound::Db::at(object_dir)?;
graph.verify_integrity(…, |commit| verify_detailed(commit, &db), …); Thinking about it, there might be another way. Looking up an object could feasibly be put behind an impl of a function that does the lookup, completely avoiding having to know git-odb. let db = git_odb::compound::Db::at(object_dir)?;
let mut buf = Vec::new();
graph.verify_integrity(…, |commit| verify_detailed(commit, |id| db.locate(id, &mut buf).transpose().ok().flatten().and_then(borrowed::Object::as_commit).map(borrowed::Commit::to_owned), …); The above would not work in a multi-threaded environment, but could be made to work by making a closure returning a closure for lookups per thread (this is common practice in git-odb already, but complicates things as you can imagine). It's certainly alright to aim for single-threadedness first to get first performance results. I would certainly prefer the second one as it is general, but can live with the first one as it's simpler to get started with. |
dc536ad
to
67a144e
Compare
Not gonna lie, that line keeps me up at night. Almost as much as I'd like to follow this up with implementing |
Missing features: 1. It operates on commit-graph files only, so it doesn't verify that commit-graph data matches `git-odb` data. 2. No progress reporting or parallelization. This shouldn't be needed until until we need to check against `git-odb` data. Example output for Linux repo: ``` $ time ./target/release/gixp commit-graph-verify -s ~/src/linux/.git/objects/info number of commits with the given number of parents 0: 4 1: 878988 2: 67800 3: 652 4: 408 5: 382 6: 454 7: 95 8: 65 9: 47 10: 25 11: 26 12: 14 13: 4 14: 3 18: 1 19: 1 20: 1 21: 1 24: 1 27: 1 30: 1 32: 1 66: 1 ->: 948976 longest path length between two commits: 160521 real 0m0.196s user 0m0.180s sys 0m0.016s ```
Get rid of `file::verify::EitherError` in favor of having `graph::verify::Error` manually copy file errors into the graph error object. So: 1. `graph::verify::Error`'s `File` variant uses a dummy type parameter for its nested `file::verify::Error` value. This dummy type parameter is essentially a never type, as `graph::verify::Error::File(file::verify::Error::Processor(...))` is not valid. 2. `Graph::verify_integrity` calls `File::traverse` with its normal error type (`graph::verify::Error`) also serving as the processor's error type. 3. `Graph::traverse` propagates processor errors to its caller via `Error::Processor(err)`. 4. `Graph::verify_integrity` manually transcribes `file::verify::Error<T>` variants into `file::verify::Error<NeverType>` variants before embedding the file error into `graph::verify::Error::File` variants.
67a144e
to
fa22cab
Compare
That sounds like it's still (a tad) better then :). Actually there is already a precedence for doing this, as a similar problem exists when resolving objects foreign to a thin pack. Technically the code could prescribe the use of
That sounds good, thanks for all the work you put into this! I will hopefully find some more time tomorrow to do a thorough review. |
Fantastic work, thank you! I just applied tiny refactorings, with two ones worth mentioning (or at least in recent memory :D):
Please share your thoughts in case you would want things to look differently, or make the changes in the next PR. In any case, I am very much looking forward to your next PR :)! |
Oh, and before I forget: Now would be a good time to add, with minimal effort, a stress test for commit-graph-verify. I think it could be two lines around here to run the verification on a big commit-graph like the one of the linux kernel repo. |
Both of your changes look good! |
The new URL should trigger an overflow check but it only happens when `url::Url::parse()` is called directly as our code doesn't let it through anymore. Here is the log from the fuzzer run as reported: ``` [Environment] ASAN_OPTIONS=handle_abort=2 +----------------------------------------Release Build Stacktrace----------------------------------------+ Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse -rss_limit_mb=2560 -timeout=60 -runs=100 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805 Time ran: 0.12717413902282715 INFO: Running with entropic power schedule (0xFF, 100). INFO: Seed: 1001182178 INFO: Loaded 1 modules (90683 inline 8-bit counters): 90683 [0x5d0c0597cce0, 0x5d0c05992f1b), INFO: Loaded 1 PC tables (90683 PCs): 90683 [0x5d0c05992f20,0x5d0c05af52d0), /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse: Running 1 inputs 100 time(s) each. Running: /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805 thread '<unnamed>' panicked at /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:272:17: attempt to add with overflow note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace AddressSanitizer:DEADLYSIGNAL ================================================================= ==1200==ERROR: AddressSanitizer: ABRT on unknown address 0x0539000004b0 (pc 0x7c51742fb00b bp 0x7ffcd1eadd80 sp 0x7ffcd1eadaf0 T0) #0 0x7c51742fb00b in raise /build/glibc-SzIz7B/glibc-2.31/sysdeps/unix/sysv/linux/raise.c:51:1 #1 0x7c51742da858 in abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79:7 #2 0x5d0c0572a1e6 in std::sys::unix::abort_internal::he854d2f74b119e66 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys/unix/mod.rs:375:14 #3 0x5d0c0518cda6 in std::process::abort::h68c27a968dc7c74f /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/process.rs:2271:5 #4 0x5d0c0564d3d4 in libfuzzer_sys::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h1e76e422e0c48db0 /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:57:9 #5 0x5d0c0571dcf7 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..Fn$LT$Args$GT$$GT$::call::h0c028c5af3475e03 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/alloc/src/boxed.rs:2021:9 #6 0x5d0c0571dcf7 in std::panicking::rust_panic_with_hook::hd26c5407fbf20d71 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:735:13 #7 0x5d0c0571da05 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h944e23ea90982f5a /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:601:13 #8 0x5d0c0571aee5 in std::sys_common::backtrace::__rust_end_short_backtrace::h8a3632d339dd3313 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys_common/backtrace.rs:170:18 #9 0x5d0c0571d781 in rust_begin_unwind /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:597:5 #10 0x5d0c05190634 in core::panicking::panic_fmt::h85c36fc727234039 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:72:14 #11 0x5d0c051906d2 in core::panicking::panic::h6a47ed7881a36f4d /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:127:5 #12 0x5d0c053e09c6 in idna::punycode::encode_into::hd674630fb161bf5b /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:0 #13 0x5d0c053eacbc in idna::uts46::Idna::to_ascii_inner::h69c52eb69ae48276 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:469:34 #14 0x5d0c053eb793 in idna::uts46::Idna::to_ascii::h76237795045112f3 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:481:26 #15 0x5d0c053eda7a in idna::uts46::Config::to_ascii::h423c722ab2fa9813 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:572:9 #16 0x5d0c053f0070 in idna::domain_to_ascii::h93e94e995d03e9ef /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/lib.rs:64:5 #17 0x5d0c0530374a in url::host::Host::domain_to_ascii::h6cb1ae8fe42a1e42 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:166:9 #18 0x5d0c0530374a in url::host::Host::parse::h962d3990e0ff5091 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:86:22 #19 0x5d0c0532e87d in url::parser::Parser::parse_host::h89faea9182ce2512 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:1024:20 #20 0x5d0c0532ba8d in url::parser::Parser::parse_host_and_port::heb44bd7ebd2593f6 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:944:33 #21 0x5d0c0532896f in url::parser::Parser::after_double_slash::hbb313f562f0978a2 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:843:13 #22 0x5d0c0531a129 in url::parser::Parser::parse_with_scheme::h54a417e4650ea024 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:453:17 #23 0x5d0c05317824 in url::parser::Parser::parse_url::hfa6b21c53cd0ac1c /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:366:20 #24 0x5d0c05350ba0 in url::ParseOptions::parse::hb8b3309b3b920457 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:257:9 #25 0x5d0c052b1ffc in url::Url::parse::h82a965c69df59bba /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:292:9 #26 0x5d0c052b1ffc in gix_url::parse::input_to_utf8_and_url::h14b70a32a8884316 gitoxide/gix-url/src/parse.rs:252:5 #27 0x5d0c052a9b9d in gix_url::parse::url::h6f7f7b0bddf4b8d7 gitoxide/gix-url/src/parse.rs:99:24 #28 0x5d0c052b34cc in gix_url::parse::hfaab74909f01c9cc gitoxide/gix-url/src/lib.rs:38:46 #29 0x5d0c05270b4a in rust_fuzzer_test_input gitoxide/gix-url/fuzz/fuzz_targets/parse.rs:5:14 #30 0x5d0c0564d537 in __rust_try libfuzzer_sys.f28e88650cadb2d4-cgu.0:0 #31 0x5d0c0564c79f in std::panicking::try::h90783eeef7e35925 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:468:19 #32 0x5d0c0564c79f in std::panic::catch_unwind::h041b281a0e92d580 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panic.rs:142:14 #33 0x5d0c0564c79f in LLVMFuzzerTestOneInput /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:28:22 #34 0x5d0c0566bb83 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15 #35 0x5d0c056572e2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6 #36 0x5d0c0565cb8c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9 #37 0x5d0c056860c2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 #38 0x7c51742dc082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/libc-start.c:308:16 #39 0x5d0c05191c4d in _start ``` `
Commit graphs are optional, so commit-graph plumbing is guarded by a
commitgraph
feature that is enabled by default.