Introduce GetRangeAndFlatMap to push computations down to FDB #5609

nblintao · 2021-09-16T06:25:08Z

It introduces GetRangeAndHop semantic to FDB. With this, SS is able to generate the keys in the queries based on another query. In another word, we can push some computations at the upper layer down to FDB. For example, querying an index and fetch records can be done in a single query! This is expected to improve latency and bandwidth when reading FDB.

See the tests to have a taste how this feature could be used.

This change doesn't affect existing code pass as long as you don't use the new APIs.

Code-Reviewer Section

The general guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

The PR has a description, explaining both the problem and the solution.
The description mentions which forms of testing were done and the testing seems reasonable.
Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

foundationdb-ci · 2021-09-16T06:47:02Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pull-request-build-macos
Commit ID: 47c7f2a
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

foundationdb-ci · 2021-09-16T07:09:44Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pull-request-build
Commit ID: 47c7f2a
Result: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Logs (available for 30 days)

xumengpanda · 2021-09-16T21:18:15Z

fdbserver/storageserver.actor.cpp

+	state Transaction tr(data->cx);
+	tr.setVersion(version);
+	// TODO: is DefaultPromiseEndpoint the best priority for this?
+	tr.info.taskID = TaskPriority::DefaultPromiseEndpoint;


this means the second read go through the read path front door.

Can we craft something like how you get the first read?

GetValueRequest req(Span().context, key, version, Optional<TagSet>(), Optional<UID>()); data->actors.add(data->readGuard(req, getValueQ)); GetValueReply reply = wait(req.reply.getFuture());

If the second read is also on the same SS, you will save the overhead of getKeyLocation() and the serialization and deserialization on the SS.

If it is in the same SS, it should have returned here /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR1623. Note /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR1630-R1637 is a fallback when the data is not in current SS.

Maybe you meant quickGetValue = the current two-hop solution?

The first query's code is at getKeyValuesAndHopQ which is effectively same as a normal range scan aka getKeyValuesQ.

hop() issues the second queries from the results returned by the first query.

quickGetValue is the second query.

xumengpanda · 2021-09-16T21:19:57Z

fdbserver/storageserver.actor.cpp

+	// TODO: is DefaultPromiseEndpoint the best priority for this?
+	tr.info.taskID = TaskPriority::DefaultPromiseEndpoint;
+	Future<Optional<Value>> valueFuture = tr.get(key, Snapshot::True);
+	// TODO: async in case it needs to read from other servers.


Can we reply error if it needs to hop to another server and let client side retry?

Doing like this means the first SS is doing the same amount of work client does if the second-hop key is not colocated with the first-hop key.

Yes, that's another design choice. See the discussion in the design doc.

I'm currently using this one because it has the simplest API and least modification in client.

@xumengpanda I would prefer first SS "do the work". If we can bypass JNI and not have to serde index records, it is a win

ah, I forgot the JNI overhead when I wrote the comment.

Letting the first SS forward the requests and do the work can avoid JNI and is likely better.
I'd like to point out its tradeoff as well:

Suppose the data flow is client->SS1 (to find index) -> SS2 (to find the value) -> SS2 directly reply to client, which means the first SS forwards the request to SS2

SS1 will need to cache the shard-to-SS mapping, which causes extra memory. That means we use some extra memory on storage node to save some cpu time on compute node.

We probably don't have to decide the corner case now.

There are three possibilities, I believe:

first SS does all the proxy calls to wherever

C client does any proxy calls when hopping within SS is not possible

API client does any proxy calls

3 has the JNI overhead, but 2 doesn't. It's more modification, as @nblintao says, but I think it might be the most robust. Particularly once we undertake to do range scans of the hopped-to place.

xumengpanda · 2021-09-16T21:41:27Z

fdbserver/storageserver.actor.cpp

+	state Transaction tr(data->cx);
+	tr.setVersion(version);
+	// TODO: is DefaultPromiseEndpoint the best priority for this?
+	tr.info.taskID = TaskPriority::DefaultPromiseEndpoint;


Maybe you meant quickGetValue = the current two-hop solution?

xumengpanda · 2021-09-16T21:55:31Z

fdbserver/storageserver.actor.cpp

+			GetKeyValuesReply _r =
+			    wait(readRange(data, version, KeyRangeRef(begin, end), req.limit, &remainingLimitBytes, span.context));
+
+			std::cout << "read range done, start hopping" << std::endl;


TraceEvent is better because it can be ingested into our log search sys.

Yeah, I will change to TraceEvent when I finalize the code.

xumengpanda · 2021-09-16T21:56:03Z

fdbserver/storageserver.actor.cpp

+			    changeCounter,
+			    KeyRangeRef(std::min<KeyRef>(begin, std::min<KeyRef>(req.begin.getKey(), req.end.getKey())),
+			                std::max<KeyRef>(end, std::max<KeyRef>(req.begin.getKey(), req.end.getKey()))));
+			if (EXPENSIVE_VALIDATION) {


making it knob will make your perf & correctness evaluation life on real clusters better

xumengpanda · 2021-09-16T22:00:22Z

fdbserver/storageserver.actor.cpp

+		result.arena.dependsOn(hopKey.arena());
+
+		std::cout << "quickGetValue start key: " << hopKey.toString() << std::endl;
+		Optional<Value> valueOption = wait(quickGetValue(data, hopKey, input.version));


If we can make sure the input.data indexed data is still on the same SS, quickGetValue() will not need the transaction inside it. right?

Yes, see /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR1619-R1622. Most of the queries should hit this and return, without transactions.

xumengpanda · 2021-09-16T22:01:00Z

fdbserver/workloads/IndexPrefetchDemo.actor.cpp

@@ -0,0 +1,171 @@
+/*
+ * IndexPrefetchDemo.actor.cpp


it means you can run this in multi-tester for perf tests

I don't understand. Could you elaborate it?

multi-tester is a load generator framework FDB uses to generate load.
Each fdbserver can start as a tester role, which will pick up the specified workloads, like the one you wrote.
These testers will behave as "clients generate traffic to the test cluster" .

Let's sync offline on how to set it up in our internal infra.

foundationdb-ci · 2021-09-17T02:57:36Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pull-request-build-macos
Commit ID: 33ad45e
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

foundationdb-ci · 2021-09-17T03:27:58Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pull-request-build
Commit ID: 33ad45e
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

foundationdb-ci · 2021-09-17T03:28:26Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pull-request-build
Commit ID: 626e517
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

MMcM · 2021-09-24T21:24:38Z

fdbclient/StorageServerInterface.h

@@ -278,11 +283,35 @@ struct GetKeyValuesReply : public LoadBalancedReply {
 	}
 };

+// Describe how to hop. Assume the keys are encoded as Tuple, hence there are concept of "elements".
+struct HopInfo {
+	// Do GetKeyValues, extract the last suffixLen elements from each keys.


I think we could make this a bit more efficient with modest additional cost in request size by adding a number of bytes to skip before starting to parse a tuple. This would also cover cases where, for some reason, the entire key isn't a well-formed tuple, but the same secondary index pattern is followed after some point.

MMcM · 2021-09-24T21:40:30Z

fdbclient/StorageServerInterface.h

+	int suffixLen;
+	// Form hop keys by appending the fetched suffixes to hopPrefix. Do GetValue using the hop keys.
+	// TODO: Support getting ranges using the hop keys as prefixes.
+	KeyRef hopPrefix;


I think this covers the important cases, but to make the scope explicit.

The simplest case is where the scanned keys are

dir, 'I', value, id1, id2

and the key-values to be fetched are

dir, 'R', id1, id2, n

(n might be a split record counter or an exploded per-field key)

This also handles cases where the scanned keys are

dir, 'I', id1, value, id2

and the key-values to be fetched are still

dir, 'R', id1, id2, n

provided that the scan fixes id1 so that it can be included in the skip prefix and the hop prefix. That is,

WHERE id1 = ? AND value BETWEEN ? AND ? ORDER BY value

It is not sufficient for cases where id1 needs to be extracted for the hop. Such as,

WHERE id1 > ? ORDER BY id1, value

Nor for the case where the index is

dir, 'I', id2, id1

for scans like

id2 > ? ORDER BY id2

Again, I am not sure more complicated tuple element shuffling descriptors are worth the trouble. The two supported cases match the ones I have seen in real life.

Yeah, I can update int suffixLen to a vector of picked item indexes in the expected order.

To make it more expressive (Is there a use case of interleaving the fetched items with the constant items?), and probably more elegant, I'm also thinking about using a single tuple to express both suffixLen and hopPrefix. It could be something like:

dir, "R", "%2", "%4", "%.*"

"%.*" is representing doing a range query rather than point query.
Escaping on the strings are needed, of course.

Is there a use case of interleaving the fetched items with the constant items?

We might distinguish three kinds of hop elements:

variable, fetched from the scanned keys

constant, occurring in both the scan range and the hop prefix

constant, occurring only in the hop prefix

The scanned range might arbitrarily permute its elements for the sake of ordering or inequality comparisons. So the first two interleave arbitrarily, too.

Placing the last someplace other than at the front is maybe a little more contrived, but suppose you have

dir, "I", value, group, id

and

dir, "R", group, "widget", id

where the constant type (widget) is implied for the original scan -- it's an index on that type only -- and so not part of its keys, but the typing is normalized within some form of group.

MMcM · 2021-09-24T21:52:53Z

fdbserver/storageserver.actor.cpp

+ACTOR Future<Optional<Value>> quickGetValue(StorageServer* data, StringRef key, Version version) {
+	if (data->shards[key]->isReadable()) {
+		try {
+			// TODO: Use a lower level API may be better? Or tweak priorities?


I also think this needs to be a range request, to get all the key-value pairs starting with the hop prefix.

MMcM · 2021-09-24T21:56:55Z

fdbserver/storageserver.actor.cpp

+	// TODO: is DefaultPromiseEndpoint the best priority for this?
+	tr.info.taskID = TaskPriority::DefaultPromiseEndpoint;
+	Future<Optional<Value>> valueFuture = tr.get(key, Snapshot::True);
+	// TODO: async in case it needs to read from other servers.


There are three possibilities, I believe:

first SS does all the proxy calls to wherever

C client does any proxy calls when hopping within SS is not possible

API client does any proxy calls

3 has the JNI overhead, but 2 doesn't. It's more modification, as @nblintao says, but I think it might be the most robust. Particularly once we undertake to do range scans of the hopped-to place.

foundationdb-ci · 2021-10-07T05:29:27Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pull-request-build
Commit ID: aec1782
Result: FAILED
Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-07T05:35:08Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pull-request-build-macos
Commit ID: aec1782
Result: FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-11T18:35:53Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pull-request-build
Commit ID: 3ef815c
Result: FAILED
Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-12T01:46:28Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pull-request-build-macos
Commit ID: d9a6aa3
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-12T01:57:16Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pull-request-build
Commit ID: d9a6aa3
Result: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-21T01:11:07Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: 99f99a0
Result: FAILED
Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-21T01:23:25Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pr-macos
Commit ID: 99f99a0
Result: FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-23T01:10:52Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pr-macos
Commit ID: bed03c6
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-23T01:22:59Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: bed03c6
Result: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-26T17:24:53Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: 36091b9
Result: FAILED
Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-26T18:42:57Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: cc57303
Result: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-10-29T23:56:22Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: fbb116f
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

xumengpanda · 2021-10-31T00:33:51Z

fdbserver/storageserver.actor.cpp

@@ -2880,9 +2870,8 @@ ACTOR Future<Void> getKeyValuesAndHopQ(StorageServer* data, GetKeyValuesAndHopRe
 			GetKeyValuesReply _r = wait(
 			    readRange(data, version, KeyRangeRef(begin, end), req.limit, &remainingLimitBytes, span.context, type));

-			//			std::cout << "read range done, start hopping" << std::endl;
+			// Hop!!!


nit: I feel directly changing the std::cout to comment is more reader-friendly than "Hop":
read range done, start hopping

xumengpanda · 2021-10-31T00:34:35Z

fdbserver/storageserver.actor.cpp

 		++data->counters.quickGetKeyValuesHit;

 		// Convert GetKeyValuesReply to RangeResult.
 		return RangeResult(RangeResultRef(reply.data, reply.more), reply.arena);
 	} catch (Error& e) {
-		//			std::cout << "quickGetValue fallback because of exception or not managed by the shard " << e.name()
-		//<< std::endl;
+		// Fallback.


nit: your cout message body is a better comment line.

xumengpanda · 2021-10-31T00:35:01Z

fdbserver/storageserver.actor.cpp

 			++data->counters.quickGetValueHit;
 			return reply.value;
 		} catch (Error& e) {
-			//			std::cout << "quickGetValue fallback because of exception " << e.name() << std::endl;


nit: your cout message body is a better comment line.

halfprice

Very complex change. Still don't fully get the implementation detail.

bindings/c/test/unit/unit_tests.cpp

halfprice · 2021-11-01T16:37:42Z

bindings/c/test/unit/unit_tests.cpp

+	// than the prefix of the key. So we don't use key() or create_data() in this test.
+	std::map<std::string, std::string> data;
+	for (int i = 0; i < 3; i++) {
+		data[indexEntryKey(i)] = EMPTY;


How do you link index with the actual key? Looks like in this example, as long as the primary key is the suffix of the index key, it will work. It is not necessary to have the index key be in the record of the value?

When users actually use it, the indexed field is likely to be stored in the value of the record, and encoded in someway (e.g. Protobuf in Record Layer). But that is not necessary for using this API.

bindings/c/test/unit/unit_tests.cpp

halfprice · 2021-11-01T16:52:53Z

fdbclient/NativeAPI.actor.cpp

-                                        Reverse reverse,
-                                        TransactionInfo info,
-                                        TagSet tags) {
+template <class GetKeyValuesMaybeHopRequest>


The client API has getRange and getRangeAndHop. Storage server also has getKeyValues and getKeyValuesAndHop. Why merging the two code path into GetKeyValuesMaybeHop in the middle? Seems unnecessary.

If the main idea is to reuse getRange implementation for both getRange and getRangeAndHop, is there a way we can tell whether a getRange is which one inside the getRange function?

Good obervation. Ideally, we should "reuse getRange implementation for both getRange and getRangeAndHop" to make the code clean and avoid duplicated code. But it turns out very hard (if not impossible) to make templates work with Flow compiler. This is one of a few places that we are able to merge.

I guess it makes sense to duplicated here as well, to keep things consistent and avoid confusions.

fdbclient/NativeAPI.actor.cpp

halfprice · 2021-11-01T17:04:25Z

fdbclient/NativeAPI.actor.h

@@ -289,6 +289,21 @@ class Transaction : NonCopyable {
 		                reverse);
 	}

+	[[nodiscard]] Future<RangeResult> getRangeAndHop(const KeySelector& begin,


Can you add some documentation how to use this API? Especially how to set hopInfo?

Yeah, there should be a document. But I'm not going to add it for now because the interface is going to be changed very soon in next PR.

halfprice · 2021-11-01T17:09:42Z

fdbclient/StorageServerInterface.h

@@ -296,6 +301,9 @@ struct GetKeyValuesRequest : TimedRequest {
 	SpanID spanContext;
 	Arena arena;
 	KeySelectorRef begin, end;
+	// This is a dummy field there has never been used.


Because of this, I think maybe we should consider sharing the implementation of getRange with different interface rather than an unified interface (the ...MaybeHop...).

halfprice · 2021-11-01T17:19:56Z

fdbserver/storageserver.actor.cpp

+					referenceTuple = &valueTuple.get();
+				} else {
+					ASSERT(false);
+					throw internal_error();


Can we throw an error with more info here? So that we can quickly identify where is the internal_error() thrown.

This piece of code will be removed in next PR, so I wish to no change it for now. nblintao@2b627b9

halfprice · 2021-11-01T17:20:43Z

fdbserver/storageserver.actor.cpp

+		Tuple hopInfoTuple = Tuple()
+		                         .append("normal"_sr)
+		                         .append("{{escaped}}"_sr)
+		                         .append("{K[2]}"_sr)


This K[], V[], is a little hard to read, still don't know how is the hopInfo is constucted and used.

jzhou77 · 2021-11-01T20:34:31Z

Some comments coming up during the discussion:

Documentation about this feature
Try to avoid slow tasks or too much work

sfc-gh-satherton · 2021-11-01T20:46:46Z

I think the ideal implementation of this feature would be that the storage server only returns to the client the results for secondary lookups it was able to resolve locally. Then, the FDB client can resolve the rest of the lookups with parallelism before the C API returns.

Internally, the return type to this request to the StorageServer interface could be logically equivalent to

std::vector<std::pair<Key, Optional<Optional<Value>>>>

and then the client can scan the vector and dispatch any !present() keys to resolve their values, which can themselves be non-present once resolve (hence the double Optional).

alecgrieser

I left one inline comment about naming, but I have a few questions as well around the semantics of this (which I did try to find in the code itself, but figured it might be easier to just ask):

Regarding continuations: I was originally a bit concerned that if a get-range-and-hop scan was terminated prematurely (e.g., to split a range scan across multiple transactions to avoid the 5 second limit) that it would be difficult
Does the key info language support returning a range of kv-pairs (per source key)? The way that the Record Layer stores records, it will both need to be able to concatenate various values together (from a range read that is currently done per index entry) as well as differentiate which record key is which (in order to differentiate the record version from the protobuf record data). (This request is fairly record layer specific, but you could imagine other layers with slightly different data models still wanting similar control. For example, you could imagine a different layer that stored each field in a record in its own key-value pair, grouped per record, so a query of the form SELECT field_1, field_2 FROM type WHERE field_3 > x might be satisfied by scanning an index on field_3 and then looking up only the values for field_1 and field_2 for each entry, which live on separate keys.)
How does the key info language handle escape characters? For example, if I legitimately had the value {K[4]} in my query string, I wouldn't want that to be accidentally interpreted as looking for value 4 in the K tuple. (If there's a spec for the language, that would probably answer both this question and the preceding.)
What's the behavior of empty records (e.g., as might result from there being a dangling index entry that points to a record that has since been deleted)? In particular, I could see users wanting to sometimes error on those situations or sometimes ignore them, but that implies being able to distinguish an empty value from a missing value.

alecgrieser · 2021-11-01T20:44:38Z

bindings/c/foundationdb/fdb_c.h

@@ -244,6 +244,24 @@ DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_transaction_get_range(FDBTransaction
                                                                  fdb_bool_t reverse);
 #endif

+DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_transaction_get_range_and_hop(FDBTransaction* tr,


minor: I don't want to bike shed too much about names, but this is in the public API, so it would be difficult to change in the future. I'm not sure "hop" really captures what this does. Maybe "resolve" or even "lookup"--in particular, when I hear "hop", I assume it's going to make a network hop, which (ideally) it won't have to when used optimally.

Thanks so much for pointing out. Changed to "flat map"after discussion.

nblintao · 2021-11-01T21:23:56Z

@alecgrieser Good questions.

Regarding continuations: I was originally a bit concerned that if a get-range-and-hop scan was terminated prematurely (e.g., to split a range scan across multiple transactions to avoid the 5 second limit) that it would be difficult

The current version doesn't have well support or tests for continuation. It will fail. Clients should have a fallback using the old way. It is planned to be improved in the future version.

Does the key info language support returning a range of kv-pairs (per source key)?

Yes, it is supported (apparently I simplified this in the presentation). If you add a special element {...} as the last element of hop info, it will do a range query using the generate key as the prefix. /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR2618 Record Layer doesn't need it but we can change HopInfo to define a range by begin/end, rather than just prefix, in the future.

How does the key info language handle escape characters? ... (If there's a spec for the language, that would probably answer both this question and the preceding.)

{ and } are escaped by {{ and }}. Code: /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR2559, test: /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR2676
I know this HopInfo thing becomes hackey, so there is a separate PR to encode HopInfo in flatbuffer, for future versions. nblintao@2b627b9
There is a spec. I should have documented them with the API, but it was a little bit rush.

What's the behavior of empty records (e.g., as might result from there being a dangling index entry that points to a record that has since been deleted)? In particular, I could see users wanting to sometimes error on those situations or sometimes ignore them, but that implies being able to distinguish an empty value from a missing value.

In the current version, it doesn't raise any problem. I had a TODO here /~https://github.com/apple/foundationdb/pull/5609/files#diff-c4dbfae809f757f2061099ee5021ce2cbbf0f4dbc2f0f18480a4856d63d033ceR2763. We could consider add this as an option in HopInfo too.

alecgrieser · 2021-11-01T21:57:59Z

Okay, I see. I think that sounds good to me, with a few more responses below.

The current version doesn't have well support or tests for continuation. It will fail.

What does that look like? Does the user get some kind of error (somehow), or is it more like if the user knows that they won't be able to guarantee that they read everything in one transaction, they have to pick a different way to read the data.

I know this HopInfo thing becomes hackey, so there is a separate PR to encode HopInfo in flatbuffer

A more structured HopInfo sounds like a good improvement. I think my concern with adding it later (rather than sooner) is mainly around backwards-compatibility (in that we wouldn't want to make this a breaking API change when we go from Tuples to FlatBuffer objects...I think) which could also mean that we'd need to have translation logic to go from one to the other, which maybe is messy (or maybe it's actually not that bad).

A lot of this might actually come down to how comfortable we are with essentially releasing this as a "beta" feature that we intend to make backwards incompatible changes to that users have to use "at their own risk".

nblintao · 2021-11-01T23:28:03Z

@sfc-gh-satherton, thanks for pointing out. @MMcM has also mentioned this as method 2 at #5609 (comment). That's exactly what I'm planning to do in the road map. This can avoid deteriorating SS throughput when fallback is needed (which is also summarized in @jzhou77's comment). In addition to that, I can imagine that users should be able to define how many requests a FDB client is able to dispatch simultaneously during fallback (like "pipeline size" concept in Record Layer), which adds additional complexity.

The current implementation handles fallback at SS because it was easier to implement at the beginning. SS is still protected from deteriorated performance by having a knob to fail rather than fallback when the key is not in the same SS (The that the it can fallback at application level).

Thanks @jzhou77 for summarizing. I wish I was in the discussion to explain:

This is an initial PR of a large feature that is going to be evolved in months if not years. Before I pushed out the "finalized" version of this PR for review, I complied a list of more than a dozen items I wanted. Surprisingly, I see every major items in the comments had been included in my list (So probably we'll have a very good consensus on the ultimate design). For many of them, it would be messy and tiring to change in the future rather than get it right in the first place (for example, a new HopInfo schema @alecgrieser). But I had to choose a set things that can be done before 7.0 release, and safe and usable for beta users.

I'll later post a document about the road map and the ultimate design. I hope to have more discussions in a meeting or offline.

I'm still on the way of getting comfortable with this get-features-in-first-and-improve-later thing, and trying to balance it with my desire of quality and good design. It's not easy.

foundationdb-ci · 2021-11-03T20:25:00Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: 3f15686
Result: FAILED
Error: Error while executing command: ninja -v -C ${BUILD_DIR} -j ${NPROC} all packages strip_targets. Reason: exit status 1
Build Logs (available for 30 days)

foundationdb-ci · 2021-11-03T21:01:01Z

AWS CodeBuild CI Report for macOS Catalina 10.15

CodeBuild project: foundationdb-pr-macos
Commit ID: 6c98e35
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

foundationdb-ci · 2021-11-03T21:16:47Z

AWS CodeBuild CI Report for Linux CentOS 7

CodeBuild project: foundationdb-pr
Commit ID: 6c98e35
Result: SUCCEEDED
Error: N/A
Build Logs (available for 30 days)

halfprice

SS logic LGTM. Please make sure you make the API clear before merging to 7.0 (documentation, making contract clear, etc.).

halfprice · 2021-11-04T03:46:15Z

fdbclient/ReadYourWrites.actor.cpp

+                                                                  GetRangeLimits limits,
+                                                                  Snapshot snapshot,
+                                                                  Reverse reverse) {
+	if (getDatabase()->apiVersionAtLeast(630)) {


is 630 true? Not 700?

Yes, the 630 copied from getRange. It is 630 because it handles special keys differently before that. Since the new API is added in 700, getDatabase()->apiVersionAtLeast(630) should aways be true. We could simplify the code by removing the else, but I would prefer keep it consistent with getRange so we can deduplicate in the future.

halfprice · 2021-11-04T03:49:00Z

fdbclient/StorageServerInterface.h

@@ -310,6 +318,43 @@ struct GetKeyValuesRequest : TimedRequest {
 	}
 };

+struct GetKeyValuesAndFlatMapReply : public LoadBalancedReply {


I think it worth noting that if one changes GetKeyValuesAndFlatMapRequest/Reply, it has to change GetKeyValuesRequest/Reply as well.

Good point. Will add a comment in a different PR. And I really wish I can find out a way to make templates work better with Flow.

halfprice · 2021-11-04T03:51:09Z

fdbserver/storageserver.actor.cpp

+	return Void();
+}
+
+ACTOR Future<GetKeyValuesAndFlatMapReply> flatMap(StorageServer* data, GetKeyValuesReply input, StringRef mapper) {


Make sure you add documentations to these functions at some point.

Re-introduce apple#5609

xumengpanda reviewed Sep 16, 2021

View reviewed changes

nblintao force-pushed the index-prefetch-demo branch from 47c7f2a to 33ad45e Compare September 17, 2021 02:35

nblintao changed the title ~~[WIP] Index Prefetch Demo~~ Index Prefetch Demo Sep 17, 2021

MMcM suggested changes Sep 24, 2021

View reviewed changes

nblintao force-pushed the index-prefetch-demo branch from 626e517 to aec1782 Compare October 7, 2021 05:13

nblintao force-pushed the index-prefetch-demo branch from aec1782 to 3ef815c Compare October 11, 2021 18:20

nblintao force-pushed the index-prefetch-demo branch from 3ef815c to d9a6aa3 Compare October 12, 2021 01:23

nblintao force-pushed the index-prefetch-demo branch from d9a6aa3 to 99f99a0 Compare October 21, 2021 01:06

nblintao force-pushed the index-prefetch-demo branch from 99f99a0 to bed03c6 Compare October 23, 2021 00:45

nblintao changed the title ~~Index Prefetch Demo~~ Add GetRangeAndHop to support index prefetch Oct 23, 2021

nblintao changed the title ~~Add GetRangeAndHop to support index prefetch~~ Introduce GetRangeAndHop to support index prefetch optimization Oct 26, 2021

nblintao force-pushed the index-prefetch-demo branch from bed03c6 to 36091b9 Compare October 26, 2021 17:09

nblintao requested a review from xumengpanda October 29, 2021 23:50

xumengpanda reviewed Oct 31, 2021

View reviewed changes

halfprice reviewed Nov 1, 2021

View reviewed changes

jzhou77 assigned xumengpanda Nov 1, 2021

alecgrieser reviewed Nov 1, 2021

View reviewed changes

Introduce getRangeAndHop to push computations down to FDB

0853661

nblintao force-pushed the index-prefetch-demo branch from fbb116f to 3f15686 Compare November 3, 2021 20:21

Rename Hop to FlatMap

6c98e35

nblintao force-pushed the index-prefetch-demo branch from 3f15686 to 6c98e35 Compare November 3, 2021 20:32

nblintao changed the title ~~Introduce GetRangeAndHop to support index prefetch optimization~~ Introduce GetRangeAndFlatMap to push computations down to FDB Nov 3, 2021

xumengpanda assigned halfprice and unassigned xumengpanda Nov 4, 2021

halfprice approved these changes Nov 4, 2021

View reviewed changes

nblintao requested a review from xumengpanda November 4, 2021 05:11

nblintao merged commit 679023a into apple:master Nov 4, 2021

nblintao mentioned this pull request Nov 4, 2021

Revert "Introduce GetRangeAndFlatMap to push computations down to FDB" #5912

Merged

nblintao added a commit to nblintao/foundationdb that referenced this pull request Nov 9, 2021

Introduce GetRangeAndFlatMap to push computations down to FDB

bad9ddf

Re-introduce apple#5609

nblintao added a commit to nblintao/foundationdb that referenced this pull request Nov 9, 2021

Introduce GetRangeAndFlatMap to push computations down to FDB

fdb3b72

Re-introduce apple#5609

This was referenced Nov 9, 2021

Introduce GetRangeAndFlatMap to push computations down to FDB #5945

Merged

(#5945 to 7.0) Introduce GetRangeAndFlatMap to push computations down to FDB #5954

Merged

xnomagichash mentioned this pull request Jul 21, 2022

Add FoundationDB 7.1 Support crclark/foundationdb-haskell#52

Closed

Introduce GetRangeAndFlatMap to push computations down to FDB #5609

Introduce GetRangeAndFlatMap to push computations down to FDB #5609

Conversation

nblintao commented Sep 16, 2021 • edited Loading

Code-Reviewer Section

For Release-Branches

foundationdb-ci commented Sep 16, 2021

AWS CodeBuild CI Report for macOS Catalina 10.15

foundationdb-ci commented Sep 16, 2021

AWS CodeBuild CI Report for Linux CentOS 7

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foundationdb-ci commented Sep 17, 2021

AWS CodeBuild CI Report for macOS Catalina 10.15

foundationdb-ci commented Sep 17, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Sep 17, 2021

AWS CodeBuild CI Report for Linux CentOS 7

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nblintao Sep 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foundationdb-ci commented Oct 7, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 7, 2021

AWS CodeBuild CI Report for macOS Catalina 10.15

foundationdb-ci commented Oct 11, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 12, 2021

AWS CodeBuild CI Report for macOS Catalina 10.15

foundationdb-ci commented Oct 12, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 21, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 21, 2021

AWS CodeBuild CI Report for macOS Catalina 10.15

foundationdb-ci commented Oct 23, 2021

AWS CodeBuild CI Report for macOS Catalina 10.15

foundationdb-ci commented Oct 23, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 26, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 26, 2021

AWS CodeBuild CI Report for Linux CentOS 7

foundationdb-ci commented Oct 29, 2021

AWS CodeBuild CI Report for Linux CentOS 7

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halfprice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nblintao Nov 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nblintao commented Sep 16, 2021 •

edited

Loading

nblintao Sep 25, 2021 •

edited

Loading

nblintao Nov 1, 2021 •

edited

Loading

nblintao commented Nov 1, 2021 •

edited

Loading