-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Test that /messages
works on remote homeserver and can be backfilled properly after many batches (MSC2716)
#214
Conversation
/messages
works on remote homeserver and can be backfilled properly after many batches/messages
works on remote homeserver and can be backfilled properly after many batches (MSC2716)
tests/msc2716_test.go
Outdated
} | ||
|
||
for { | ||
if time.Since(start) > 200*c.SyncUntilTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout is currently very large to accommodate the long ~20s /messages
durations. We also have to make that request 11 times during the test which adds up very fast.
Synapse really has to chug for those requests 👹 and ideally wouldn't have to modify this at all.
I would need to look into optimizing Synapse to make this fast which we should probably do anyway as this is painfully slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently set to a whopping 1000s by default
complement/internal/docker/deployment.go
Line 57 in 136fd60
SyncUntilTimeout: 5 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kegsay Sorry this wasn't clear as undrafting indicates "marked this pull request as ready for review" but I didn't assign you this one yet specifically because of this problem.
The test itself is good to go (timeout can be switched to normal and numBatches
could be switched back to 11
) and shipped but want to make it actually acceptable time-wise to run against Synapse before merging.
I used numBatches=2
during testing because it's much faster to see results while developing.
Thanks for the review pass though and I'll fix up these other spots ⏩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MadLittleMods what is the status of this PR? It's still marked as a Draft.
Status is still the same as the last update in this thread. It's too slow on Synapse for me to be comfortable merging it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of optimizing Synapse to make this test viable to run time-wise, I'm a bit blocked on a race condition in some recent code, matrix-org/synapse#12394 (comment) -> matrix-org/synapse#12646
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of optimizing Synapse to make this test viable to run time-wise, I'm a bit blocked on a race condition in some recent code, matrix-org/synapse#12394 (comment) -> matrix-org/synapse#12646
Now matrix-org/synapse#12988 I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made progress on optimizing Synapse:
- Saves 10s per
/messages
call: Fixhave_seen_event
cache not being invalidated synapse#13863 - Still need to figure out Optimize backfill receiving to have less missing
prev_event
thrashing (scratch) synapse#13864
/messages
works on remote homeserver and can be backfilled properly after many batches (MSC2716)/messages
works on remote homeserver and can be backfilled properly after many batches (MSC2716)
tests/msc2716_test.go
Outdated
// could pick for other homeserver implementations so I just did 10+1 to | ||
// make sure it also worked in that case. | ||
//numBatches := 11 | ||
numBatches := 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely comment but then you seem to use 2. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See:
I used
numBatches=2
during testing because it's much faster to see results while developing.
Running this test against Synapse is extremely slow atm, #214 (comment)
TODO: Reminder to revert this to 11 before merge
tests/msc2716_test.go
Outdated
} | ||
|
||
// Make sure we see the events at the very start of the message history | ||
expectedEventIDs = append(expectedEventIDs, eventIDsBefore...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is new -> old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order doesn't matter because it's just a list of expected ID's, expectedEventIDs
We also just use it in check off function (paginateUntilMessageCheckOff
) which doesn't care about the order.
There is a similar variable in another test where order does matter but we call it expectedEventIDOrder
tests/msc2716_test.go
Outdated
} | ||
|
||
for { | ||
if time.Since(start) > 200*c.SyncUntilTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently set to a whopping 1000s by default
complement/internal/docker/deployment.go
Line 57 in 136fd60
SyncUntilTimeout: 5 * time.Second, |
tests/msc2716_test.go
Outdated
actualEventIDList = append(actualEventIDList, eventID) | ||
|
||
if _, keyExists := workingExpectedEventIDMap[eventID]; keyExists { | ||
delete(workingExpectedEventIDMap, eventID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function doesn't actually check the events are in the correct order, just that they exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, just a checkoff function
tests/msc2716_test.go
Outdated
if !keyRes.Exists() { | ||
t.Fatalf("missing key '%s'", wantKey) | ||
} | ||
if !keyRes.IsArray() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use JSONArrayEach
?
complement/internal/match/json.go
Line 164 in 136fd60
func JSONArrayEach(wantKey string, fn func(gjson.Result) error) JSON { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I think it was because we can't break early with JSONArrayEach
(can only break early if you throw an error which fails the test). JSONArrayEach
could be refactored to have the item function return keepGoing, err
but rather do that in another PR since it touches so many other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the early return really a deal breaker? I don't think it'll affect runtime performance that much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to re-use JSONArrayEach
. It's not so bad ⏩
/messages
works on remote homeserver and can be backfilled properly after many batches (MSC2716)/messages
works on remote homeserver and can be backfilled properly after many batches (MSC2716)
…ugging Split off from #214
…till-works-with-many-batches Conflicts: tests/msc2716_test.go
…till-works-with-many-batches Conflicts: tests/msc2716_test.go
…ll-works-with-many-batches
@MadLittleMods what is the status of this PR? It's still marked as a Draft. |
Also includes `paginateUntilMessageCheckOff` from #214
…ll-works-with-many-batches
Synapse changes: matrix-org/synapse#12718 Part of MSC2716: matrix-org/matrix-spec-proposals#2716 We're testing to make sure historical messages show up for a remote federated homeserver even when the homeserver is missing the part of the timeline where the marker events were sent and it paginates before they occurred to see if the history is available. Making sure the homeserver processes all of the markers from the current state instead of just when it sees them in the timeline. Sending marker events as state now so they are always able to be seen by homeservers (not lost in some timeline gap). Marker events should be sent with a unique `state_key` so that they can all resolve in the current state to easily be discovered. - If we re-use the same `state_key` (like `""`), then we would have to fetch previous snapshots of state up through time to find all of the marker events. This way we can avoid all of that. - Also avoids state resolution conflicts where only one of the marker events win As a homeserver, when we see new marker state, we know there is new history imported somewhere back in time and should process it to fetch the insertion event where the historical messages are and set it as an insertion extremity. This way we know where to backfill more messages when someone asks for scrollback. Also includes `paginateUntilMessageCheckOff` from #214
…ll-works-with-many-batches Conflicts: tests/msc2716_test.go
Marked as stale; this PR will be closed in 7 days if there isn't any further progress. Please re-open when you get time. Long-lived branches are fine, but long-lived PRs like this always being in |
…ll-works-with-many-batches
Fix #13856 Fix #13865 > Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](matrix-org/complement#214 (comment)). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to #13625) > > But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](/~https://github.com/matrix-org/synapse/milestone/11). > > *-- #13856 ### The problem `_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT). Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run. Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
// Since this will throw an error if they key does not exist, | ||
// do this at the end of the loop. It's a valid scenario to be at the end | ||
// of the room and have no more to paginate so we want the `return` when | ||
// we've found all of the expected events to have a chance to run first. | ||
messageResEnd = client.GetJSONFieldStr(t, messsageResBody, "end") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out to #489
…#489) As spec'd: > If no further events are available (either because we have reached the start of the timeline, or because the user does not have permission to see any more events), this property is omitted from the response. > > *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixclientv3roomsroomidmessages* Split out from #214 Previously, it would throw the following error and potentially we found all of the events within that last response to satisfy everything, ``` JSONFieldStr: key 'end' missing ```
Test that
/messages
works on remote homeserver and can be backfilled properly after many batchesPart of MSC2716: matrix-org/matrix-spec-proposals#2716
Related to #212
Used while developing matrix-org/synapse#11114
Dev notes