Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[commitgraph] implement basic, low-level read API #21

Merged
merged 25 commits into from
Oct 1, 2020

Conversation

avoidscorn
Copy link
Contributor

This is just the first pass at ensuring we can read commit-graph
files correctly. It does not support anything interesting like
reachability, merge-base, or ahead-behind queries.

This is just the first pass at ensuring we can read commit-graph
files correctly. It does not support anything interesting like
reachability, merge-base, or ahead-behind queries.
@avoidscorn
Copy link
Contributor Author

TODO before merging:

  1. Add tests for non-happy-path cases.
  2. Fill in docs.
  3. Integrate into CI.

I wanted to settle on the test fixtures before making a PR, as I wanted to avoid polluting git history with too many fixture changes. But I can still split this into multiple PRs if desired.

@Byron
Copy link
Member

Byron commented Sep 15, 2020

(Related to #17 )

Thanks a lot! This is a ton of work and a massive, much appreciated contribution!

I wanted to settle on the test fixtures before making a PR, as I wanted to avoid polluting git history with too many fixture changes. But I can still split this into multiple PRs if desired.

Let's keep things as easy for you as possible, and keeping everything in one PR is absolutely alright with me.
Regarding the TODOs:

  • Is it truly necessary to test for non-happy cases? If there are no unwrap() calls, I believe we are sure the program will not panic unexpectedly. Maybe an opportunity to save time.
  • Docs are nice, and I usually write them this early only if I have to keep intricate, difficult knowledge (relative to myself at the time) somewhere. Maybe the tests are already doing this, but I do recommend waiting with excessive docs until there is a tool using and validating the current API (something that could be added in future PRs).
  • For CI integration, have a look at the makefile. I would expect commit-graph tests to be picked up already, but there is where you would setup anything special.

In any case, I would hope we can focus on making this PR mergable soon, and not extend its scope too much, rather the opposite.
I would also take a more thorough look in the next days if you are interested in suggestions and questions as they come up.

@Byron Byron marked this pull request as ready for review September 15, 2020 08:52
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a very rough look at a few pieces, and really like that the implementation tries to follow the overall style of what's already there in the other crates. If you have any feedback about that of things you would rather want to do differently, please let me know.

When I saw the V1 mentioned in the fixture directories, I was a little confused as there seems to be only one version of the commit-graph format. Maybe this was done akin to the fixtures in git-protocol, which does have to known formats right now (unfortunately).

I would be very interested to learn if you see interesting applications even in the context of a 'plumbing' tool, something like statistics maybe.

git-commitgraph/Cargo.toml Show resolved Hide resolved
git-commitgraph/tests/access/octopus_merges.rs Outdated Show resolved Hide resolved
git-commitgraph/src/graph/init.rs Outdated Show resolved Hide resolved
git-commitgraph/src/graph/init.rs Outdated Show resolved Hide resolved
@avoidscorn
Copy link
Contributor Author

I guess I should follow up with some terminology/style questions:

  1. Are you okay with fully-spelled out type names like GraphPosition/LexPosition? Would you prefer GraphPos/LexPos?
  2. I've been ordering struct fields/impl methods/enum variants in alphabetical order and putting private methods in separate impls. Any preference on this?
  3. Do you have guidance on which symbols the top-level module should re-export?

@Byron
Copy link
Member

Byron commented Sep 16, 2020

Great! Let's do this.

Are you okay with fully-spelled out type names like GraphPosition/LexPosition? Would you prefer GraphPos/LexPos?

I prefer fully spelled out names, but am fine if the shortened name is canonical. Notable example might be the term ref, which is also used in this codebase. That probably doesn't apply to GraphPos though.

I've been ordering struct fields/impl methods/enum variants in alphabetical order and putting private methods in separate impls. Any preference on this?

No. I find myself using multiple impls as well, not for private/public separation, but to (arbitrarily) put related methods into a 'category' impl block which can even carry its own documentation.

Do you have guidance on which symbols the top-level module should re-export?

I am usually keeping everything in their own modules until code that uses the crate really 'asks' for some top-level exports for convenience. An example for this was git_packetline::PacketLine, which is a rename of git_packetline::Borrowed. Maybe that's another hint at how the gitoxide crates name things - they try not to repeat their parent crate name or their module name.

I hope that helps!

This gives callers more freedom in how their directories are structured.
Commit-graph files only have one version for now, so no need to indicate
file versions. Still return an error if attempting to parse a file that
is not at version 1.
Always generate test repos on demand instead.

This saves some repo space and keeps us from having to worry about
fixture scripts being out of sync with the generated fixtures. It also
lets us use git to inspect repos instead of hardcoding expected commit
data.

On the flip side, tests take ~400ms instead of ~20ms.
I had to change `#[forbid(rust_2018_idioms)]` to
`#[deny(rust_2018_idioms)]` in order for `quick_error!` to compile.
Users generally don't access individual commit-graph files.
@Byron
Copy link
Member

Byron commented Sep 23, 2020

I saw some changes were made, and besides the windows test failures, is there anything I should look at in particular?

@avoidscorn
Copy link
Contributor Author

I believe this PR is ready for real review now, although it looks like the odb tests are bamboozling me.

I'd prefer these commits to get squashed when or before merging, but I don't know if you are okay doing that via github or whether I should do it after review is complete. I don't plan on signing commits anytime soon, so any merge strategy is okay with me.

@Byron
Copy link
Member

Byron commented Sep 28, 2020

Alright, I will be working on it, pushing some changes to make the PR mergeable. I will let you know when done for a chance to hear what you think about the changes.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally managed to have a thorough look at each fo the ~1000 lines of code you are generously contributing, and am absolutely amazed about the level of quality I believe to see even with my limited understanding of the commit-graph data as a whole.

I dared to make minor refactorings mostly around Position and LexPosition.
The only question remaining is if there is anything you can imagine doing about the commented-out code in file::init. My suggestion is to remove it, while possibly keeping the note.

If there is nothing you want to change, I am happy to merge this PR and release a v0.1 to crates.io.

Hopefully you will find the time to continue your work on this crate or gitoxide as a whole! Something I would love to see is a stress-test which runs some plumbing command on big commit-graphs on real-world repositories, like the linux or rust repositories. Infrastructure for that exists already.

Below is my notes

  • adjust tests to disable gpgsignatures
  • do graph results need a reference to their owning file?
    • Yes, as it allows to obtain additional information related to the item in the file itself, like File::commit_at(…)
  • feature-toggled support for serde
  • make tests depend on checked-in fixtures, instead of generating them (and depend on git on CI), making it easy to recreate them
    • the tests currently rely on calling git, see inspect_refs(…)
  • Questions
    • How can Commit return Graph positions? It doesn't seem to learn about an offset.
      • Parent IDs are indeed specified as graph positions, not file positions, as they may be in previous commit graph files.
    • What to do with the 'extra-garbage',
      some code is commented out.
  • Future Work
    • A plumbing command to extract some value from the current implementation, maybe statistics, or verification
    • Application of the command above in a stress test

@Byron Byron merged commit 8b003a0 into GitoxideLabs:main Oct 1, 2020
@avoidscorn avoidscorn deleted the commit-graph branch October 2, 2020 00:49
@avoidscorn
Copy link
Contributor Author

All your changes look good!

Plumbing for verification sounds like a good next step. I'll try to be more incremental next time, e.g. stubs in gitoxide-core, then a naive verifier, then a parallelized verifier, then a verifier with progress.

Byron added a commit that referenced this pull request Oct 15, 2023
The new URL should trigger an overflow check but it only
happens when `url::Url::parse()` is called directly as our
code doesn't let it through anymore.

Here is the log from the fuzzer run as reported:
```
	[Environment] ASAN_OPTIONS=handle_abort=2
+----------------------------------------Release Build Stacktrace----------------------------------------+
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse -rss_limit_mb=2560 -timeout=60 -runs=100 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805
Time ran: 0.12717413902282715
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1001182178
INFO: Loaded 1 modules   (90683 inline 8-bit counters): 90683 [0x5d0c0597cce0, 0x5d0c05992f1b),
INFO: Loaded 1 PC tables (90683 PCs): 90683 [0x5d0c05992f20,0x5d0c05af52d0),
/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse: Running 1 inputs 100 time(s) each.
Running: /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805
thread '<unnamed>' panicked at /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:272:17:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1200==ERROR: AddressSanitizer: ABRT on unknown address 0x0539000004b0 (pc 0x7c51742fb00b bp 0x7ffcd1eadd80 sp 0x7ffcd1eadaf0 T0)
    #0 0x7c51742fb00b in raise /build/glibc-SzIz7B/glibc-2.31/sysdeps/unix/sysv/linux/raise.c:51:1
    #1 0x7c51742da858 in abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79:7
    #2 0x5d0c0572a1e6 in std::sys::unix::abort_internal::he854d2f74b119e66 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys/unix/mod.rs:375:14
    #3 0x5d0c0518cda6 in std::process::abort::h68c27a968dc7c74f /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/process.rs:2271:5
    #4 0x5d0c0564d3d4 in libfuzzer_sys::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h1e76e422e0c48db0 /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:57:9
    #5 0x5d0c0571dcf7 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..Fn$LT$Args$GT$$GT$::call::h0c028c5af3475e03 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/alloc/src/boxed.rs:2021:9
    #6 0x5d0c0571dcf7 in std::panicking::rust_panic_with_hook::hd26c5407fbf20d71 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:735:13
    #7 0x5d0c0571da05 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h944e23ea90982f5a /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:601:13
    #8 0x5d0c0571aee5 in std::sys_common::backtrace::__rust_end_short_backtrace::h8a3632d339dd3313 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys_common/backtrace.rs:170:18
    #9 0x5d0c0571d781 in rust_begin_unwind /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:597:5
    #10 0x5d0c05190634 in core::panicking::panic_fmt::h85c36fc727234039 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:72:14
    #11 0x5d0c051906d2 in core::panicking::panic::h6a47ed7881a36f4d /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:127:5
    #12 0x5d0c053e09c6 in idna::punycode::encode_into::hd674630fb161bf5b /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:0
    #13 0x5d0c053eacbc in idna::uts46::Idna::to_ascii_inner::h69c52eb69ae48276 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:469:34
    #14 0x5d0c053eb793 in idna::uts46::Idna::to_ascii::h76237795045112f3 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:481:26
    #15 0x5d0c053eda7a in idna::uts46::Config::to_ascii::h423c722ab2fa9813 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:572:9
    #16 0x5d0c053f0070 in idna::domain_to_ascii::h93e94e995d03e9ef /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/lib.rs:64:5
    #17 0x5d0c0530374a in url::host::Host::domain_to_ascii::h6cb1ae8fe42a1e42 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:166:9
    #18 0x5d0c0530374a in url::host::Host::parse::h962d3990e0ff5091 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:86:22
    #19 0x5d0c0532e87d in url::parser::Parser::parse_host::h89faea9182ce2512 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:1024:20
    #20 0x5d0c0532ba8d in url::parser::Parser::parse_host_and_port::heb44bd7ebd2593f6 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:944:33
    #21 0x5d0c0532896f in url::parser::Parser::after_double_slash::hbb313f562f0978a2 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:843:13
    #22 0x5d0c0531a129 in url::parser::Parser::parse_with_scheme::h54a417e4650ea024 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:453:17
    #23 0x5d0c05317824 in url::parser::Parser::parse_url::hfa6b21c53cd0ac1c /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:366:20
    #24 0x5d0c05350ba0 in url::ParseOptions::parse::hb8b3309b3b920457 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:257:9
    #25 0x5d0c052b1ffc in url::Url::parse::h82a965c69df59bba /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:292:9
    #26 0x5d0c052b1ffc in gix_url::parse::input_to_utf8_and_url::h14b70a32a8884316 gitoxide/gix-url/src/parse.rs:252:5
    #27 0x5d0c052a9b9d in gix_url::parse::url::h6f7f7b0bddf4b8d7 gitoxide/gix-url/src/parse.rs:99:24
    #28 0x5d0c052b34cc in gix_url::parse::hfaab74909f01c9cc gitoxide/gix-url/src/lib.rs:38:46
    #29 0x5d0c05270b4a in rust_fuzzer_test_input gitoxide/gix-url/fuzz/fuzz_targets/parse.rs:5:14
    #30 0x5d0c0564d537 in __rust_try libfuzzer_sys.f28e88650cadb2d4-cgu.0:0
    #31 0x5d0c0564c79f in std::panicking::try::h90783eeef7e35925 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:468:19
    #32 0x5d0c0564c79f in std::panic::catch_unwind::h041b281a0e92d580 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panic.rs:142:14
    #33 0x5d0c0564c79f in LLVMFuzzerTestOneInput /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:28:22
    #34 0x5d0c0566bb83 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #35 0x5d0c056572e2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #36 0x5d0c0565cb8c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #37 0x5d0c056860c2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #38 0x7c51742dc082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/libc-start.c:308:16
    #39 0x5d0c05191c4d in _start
```
`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants