Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seastar friendly protobuf parser #22714

Merged
merged 4 commits into from
Aug 16, 2024
Merged

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Aug 2, 2024

Changes:

  • container: fix frag_vector
  • iobuf/parser: support peek_bytes
  • utils/vint: Support all of {,u}int{64,32}_t
  • serde/proto: seastar friendly protobuf parser

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

@dotnwat
Copy link
Member

dotnwat commented Aug 3, 2024

fire emoji

@rockwotj rockwotj force-pushed the protobuf branch 3 times, most recently from 7b9b529 to a88ddeb Compare August 4, 2024 03:55
@rockwotj rockwotj changed the title WIP: Seastar friendly protobuf parser Seastar friendly protobuf parser Aug 5, 2024
@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 5, 2024

Split out #22733

The rest I will leave here

@rockwotj rockwotj force-pushed the protobuf branch 2 times, most recently from d0757a4 to dcb1104 Compare August 5, 2024 21:30
@rockwotj rockwotj marked this pull request as ready for review August 5, 2024 21:32
@rockwotj rockwotj requested review from jcipar and andrwng August 5, 2024 21:43
@rockwotj rockwotj self-assigned this Aug 5, 2024
@rockwotj rockwotj added the area/cloud-storage Shadow indexing subsystem label Aug 5, 2024
@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/52504#019125cc-41f4-41ee-92a5-0dc7f67438c9:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS"

@rockwotj rockwotj force-pushed the protobuf branch 2 times, most recently from 7780c8e to a866564 Compare August 6, 2024 13:51
@rockwotj rockwotj requested a review from dotnwat August 6, 2024 17:14
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is legendary. not done reviewing, but will revisit. is the plan to merge this then fix panda proxy, or pandaproxy then pandaproxy protobuf upgrade, then this?

return _frags.at(index / elems_per_frag).at(index % elems_per_frag);
}

T& at(size_t index) {
reference at(size_t index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do i understand correctly that we are returning a value?

so something like

auto& x = foo.at(3)

wouldn't actually compile because we can't have a reference to temporary?

if so, then presumably it makes sense to mimic vector<bool> semantics, but do want to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do i understand correctly that we are returning a value?

Only for chunked_vector<bool>, for everything else this is unchanged (we return a reference).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code as is only fails to compile for chunked_vector<bool> which I am the first user of in the protobuf parser, so this fixes bool as a parameter to fragmented_vector. Really quite a silly template specification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess my question was more related to vector<bool>'s weirdness, and if we wanted to carry that forward. in particular, looking at the spec for it, it seems like the above statement I wrote wouldn't compile, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like the above statement I wrote wouldn't compile, right?

Well if foo is a std::vector<bool> then correct it wouldn't compile, which is why I had to make this change.

Copy link
Contributor Author

@rockwotj rockwotj Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we wanted to carry that forward

I think we're pretty much forced too unless we change the inner container type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting this would be great to be folded back into the commit message. It wasn't immediately obvious to me what the problem was until I read this thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to explain myself a bit better in the commit message. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you!

src/v/bytes/include/bytes/iobuf_parser.h Outdated Show resolved Hide resolved
src/v/utils/vint.h Outdated Show resolved Hide resolved
@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 6, 2024

is the plan to merge this then fix panda proxy, or pandaproxy then pandaproxy protobuf upgrade, then this?

I'm assuming that your PR will go in, then we'll fix pandaproxy, then we'll merge this.

@rockwotj rockwotj force-pushed the protobuf branch 2 times, most recently from fc66de0 to 375846a Compare August 6, 2024 20:59
@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 6, 2024

Force pushes:

  • constify peek_bytes
  • fix name shadowing in vint

@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 14, 2024

Force pushes: review feedback and rebase

@@ -54,7 +54,7 @@ struct var_decoder {
template<typename Range>
inline std::pair<uint64_t, size_t>
deserialize(Range&& r, size_t max_bytes) noexcept {
auto limit = ((max_bytes - 1) * 7);
uint8_t limit = ((max_bytes - 1) * 7);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: belongs in the other commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I wonder if it makes sense to make max_bytes a template/compile-time arg so it's more obvious that this won't overflow uint8_max

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't understand why this isn't uint64_t that is what we're comparing it against. I'm just going to make everything uint64_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is a mess. I'm going to change it to uint64_t (we're comparing to a vint64_t) for now. Making it a template parameter is a larger change

src/v/serde/protobuf/parser.cc Outdated Show resolved Hide resolved
Previously reading data from frag_vector returned a reference (T& and
const T&). However std::vector<bool> does not return references from
methods like `at` and `operator[]`. So the following code FAILS to
compile:

```
std::vector<bool> vec;
vec.push_back(false);
bool& elem = vec[0]; // ERROR: cannot take a reference to a temporary
```

In order to fix this we return copies instead of references for
`fragmented_vector<bool>`. This is fine because the copies are trivial,
and there are no instances of this in the tree at the moment (by
definition because it doesn't compile). The only way to keep everything
as references is to remove `std::vector` as the inner container to
fragmented_vector.
Incase you want to peek a small amount of contiguous data
Protobuf needs the ability to read all int sizes with/without zigzag
decoding, so move the limit into the parser.

Ideally we do this without the `internal` namespace usage, but doing
that would be a larger refactoring that should take place independantly
of this patchset
@rockwotj rockwotj force-pushed the protobuf branch 2 times, most recently from f047b2b to 4952014 Compare August 15, 2024 17:35
andrwng
andrwng previously approved these changes Aug 15, 2024
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note about what looks like a missing throw but otherwise lgtm! Awesome work!

src/v/serde/protobuf/parser.cc Outdated Show resolved Hide resolved
andrwng
andrwng previously approved these changes Aug 15, 2024
src/v/serde/protobuf/parser.h Outdated Show resolved Hide resolved
src/v/serde/protobuf/parser.cc Show resolved Hide resolved
src/v/serde/protobuf/parser.cc Show resolved Hide resolved
@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 15, 2024

Force push: add comments and fix compat issue with packed fields

@rockwotj
Copy link
Contributor Author

Force push: reorder switch cases

In support of the iceberg project which needs to be able to parse
arbitrary protocol buffers, create a parser that works within the
constraints of seastar - namely it yields cooperatively when parsing
very large (or complex) protocol buffers, it is zero copy
for string and byte types, and supports non-contiguous allocations of
data for `repeated` and `map` types.
Copy link
Contributor

@jcipar jcipar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@rockwotj rockwotj merged commit 6a68058 into redpanda-data:dev Aug 16, 2024
17 checks passed
@rockwotj rockwotj deleted the protobuf branch August 16, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/redpanda
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants