-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seastar friendly protobuf parser #22714
Conversation
fire emoji |
7b9b529
to
a88ddeb
Compare
Split out #22733 The rest I will leave here |
d0757a4
to
dcb1104
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52498#019124da-66ac-4786-95c8-f68e904c0201 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52504#019125cc-bfde-480f-b463-4e4921f78677 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52562#019129dd-3494-478d-b71e-0ac1ea5ff923 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52562#019129dd-3493-4635-9b34-36d1786cb60f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52630#01912ec0-7ea3-41e7-9915-622751e17ab4 |
new failures in https://buildkite.com/redpanda/redpanda/builds/52504#019125cc-41f4-41ee-92a5-0dc7f67438c9:
|
7780c8e
to
a866564
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is legendary. not done reviewing, but will revisit. is the plan to merge this then fix panda proxy, or pandaproxy then pandaproxy protobuf upgrade, then this?
return _frags.at(index / elems_per_frag).at(index % elems_per_frag); | ||
} | ||
|
||
T& at(size_t index) { | ||
reference at(size_t index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do i understand correctly that we are returning a value?
so something like
auto& x = foo.at(3)
wouldn't actually compile because we can't have a reference to temporary?
if so, then presumably it makes sense to mimic vector<bool>
semantics, but do want to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do i understand correctly that we are returning a value?
Only for chunked_vector<bool>
, for everything else this is unchanged (we return a reference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code as is only fails to compile for chunked_vector<bool>
which I am the first user of in the protobuf parser, so this fixes bool
as a parameter to fragmented_vector
. Really quite a silly template specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess my question was more related to vector<bool>
's weirdness, and if we wanted to carry that forward. in particular, looking at the spec for it, it seems like the above statement I wrote wouldn't compile, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like the above statement I wrote wouldn't compile, right?
Well if foo
is a std::vector<bool>
then correct it wouldn't compile, which is why I had to make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we wanted to carry that forward
I think we're pretty much forced too unless we change the inner container type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting this would be great to be folded back into the commit message. It wasn't immediately obvious to me what the problem was until I read this thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to explain myself a bit better in the commit message. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thank you!
I'm assuming that your PR will go in, then we'll fix pandaproxy, then we'll merge this. |
fc66de0
to
375846a
Compare
Force pushes:
|
Force pushes: review feedback and rebase |
src/v/utils/vint.h
Outdated
@@ -54,7 +54,7 @@ struct var_decoder { | |||
template<typename Range> | |||
inline std::pair<uint64_t, size_t> | |||
deserialize(Range&& r, size_t max_bytes) noexcept { | |||
auto limit = ((max_bytes - 1) * 7); | |||
uint8_t limit = ((max_bytes - 1) * 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: belongs in the other commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I wonder if it makes sense to make max_bytes a template/compile-time arg so it's more obvious that this won't overflow uint8_max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't understand why this isn't uint64_t that is what we're comparing it against. I'm just going to make everything uint64_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is a mess. I'm going to change it to uint64_t (we're comparing to a vint64_t) for now. Making it a template parameter is a larger change
Previously reading data from frag_vector returned a reference (T& and const T&). However std::vector<bool> does not return references from methods like `at` and `operator[]`. So the following code FAILS to compile: ``` std::vector<bool> vec; vec.push_back(false); bool& elem = vec[0]; // ERROR: cannot take a reference to a temporary ``` In order to fix this we return copies instead of references for `fragmented_vector<bool>`. This is fine because the copies are trivial, and there are no instances of this in the tree at the moment (by definition because it doesn't compile). The only way to keep everything as references is to remove `std::vector` as the inner container to fragmented_vector.
Incase you want to peek a small amount of contiguous data
Protobuf needs the ability to read all int sizes with/without zigzag decoding, so move the limit into the parser. Ideally we do this without the `internal` namespace usage, but doing that would be a larger refactoring that should take place independantly of this patchset
f047b2b
to
4952014
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small note about what looks like a missing throw but otherwise lgtm! Awesome work!
Force push: add comments and fix compat issue with packed fields |
Force push: reorder switch cases |
In support of the iceberg project which needs to be able to parse arbitrary protocol buffers, create a parser that works within the constraints of seastar - namely it yields cooperatively when parsing very large (or complex) protocol buffers, it is zero copy for string and byte types, and supports non-contiguous allocations of data for `repeated` and `map` types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Changes:
Backports Required
Release Notes