Skip to content

Commit

Permalink
serde/proto: seastar friendly protobuf parser
Browse files Browse the repository at this point in the history
In support of the iceberg project which needs to be able to parse
arbitrary protocol buffers, create a parser that works within the
constraints of seastar - namely it yields cooperatively when parsing
very large (or complex) protocol buffers, it is zero copy
for string and byte types, and supports non-contiguous allocations of
data for `repeated` and `map` types.
  • Loading branch information
rockwotj committed Aug 15, 2024
1 parent cde79b6 commit c0c0dc6
Show file tree
Hide file tree
Showing 13 changed files with 1,841 additions and 1 deletion.
1 change: 1 addition & 0 deletions MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ use_repo(non_module_dependencies, "hwloc")
use_repo(non_module_dependencies, "jsoncons")
use_repo(non_module_dependencies, "krb5")
use_repo(non_module_dependencies, "libpciaccess")
use_repo(non_module_dependencies, "libprotobuf_mutator")
use_repo(non_module_dependencies, "libxml2")
use_repo(non_module_dependencies, "lksctp")
use_repo(non_module_dependencies, "numactl")
Expand Down
205 changes: 204 additions & 1 deletion MODULE.bazel.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions bazel/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,14 @@ def data_dependency():
url = "https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/archive/2ec2576cabefef1eaa5dd9307c97de2e887fc347/libpciaccess-2ec2576cabefef1eaa5dd9307c97de2e887fc347.tar.gz",
)

http_archive(
name = "libprotobuf_mutator",
build_file = "//bazel/thirdparty:libprotobuf-mutator.BUILD",
integrity = "sha256-KWUbFgNpDJtAO6Kr0eTo1v6iczEOta72jSle9oivFhg=",
strip_prefix = "libprotobuf-mutator-b922c8ab9004ef9944982e4f165e2747b13223fa",
url = "/~https://github.com/google/libprotobuf-mutator/archive/b922c8ab9004ef9944982e4f165e2747b13223fa.zip",
)

http_archive(
name = "libxml2",
build_file = "//bazel/thirdparty:libxml2.BUILD",
Expand Down
25 changes: 25 additions & 0 deletions bazel/thirdparty/libprotobuf-mutator.BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# See google/libprotobuf-mutator#91

cc_library(
name = "libprotobuf_mutator",
testonly = 1,
srcs = glob(
[
"src/*.cc",
"src/*.h",
],
exclude = [
"**/*_test.cc",
"src/mutator.h",
],
) + [
"port/protobuf.h",
],
hdrs = [
"src/mutator.h",
],
include_prefix = "protobuf_mutator",
strip_include_prefix = "src",
visibility = ["//visibility:public"],
deps = ["@protobuf"],
)
21 changes: 21 additions & 0 deletions src/v/serde/protobuf/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
load("//bazel:build.bzl", "redpanda_cc_library")

redpanda_cc_library(
name = "protobuf",
srcs = [
"parser.cc",
],
hdrs = [
"parser.h",
],
include_prefix = "serde/protobuf",
visibility = ["//visibility:public"],
deps = [
"//src/v/bytes:iobuf",
"//src/v/bytes:iobuf_parser",
"//src/v/container:chunked_hash_map",
"//src/v/container:fragmented_vector",
"//src/v/utils:vint",
"@protobuf",
],
)
19 changes: 19 additions & 0 deletions src/v/serde/protobuf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Protobuf Parser

This directory contains a seastar friendly protobuf parser.
It should adhere to the same compatibility guarantees as the offical C++ library
but has a few notable differences:

1. Does not make contiguous allocations of repeated fields, maps or strings/bytes types
2. Is reactor friendly on deeply nested or large protobufs in that it will yield control
3. Is a stackless parser, so it is not bound by the smallish 1MB stacks that seastar uses for threads

## Development

If you are tasked with updating this code, here are a few helpful links:

1. [Encoding spec](https://protobuf.dev/programming-guides/encoding/) (note this elides some important details about how invalid/corrupted data is handled)
2. [Golang protobuf parser](/~https://github.com/protocolbuffers/protobuf-go/blob/master/proto/decode.go)
3. [Java protobuf parser](/~https://github.com/protocolbuffers/protobuf/tree/main/java/core/src/main/java/com/google/protobuf)
4. [C++ protobuf parser](/~https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wire_format_lite.cc)
5. [Protobuf Zero](/~https://github.com/mapbox/protozero/blob/master/include/protozero/pbf_reader.hpp)
Loading

0 comments on commit c0c0dc6

Please sign in to comment.