Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't waste time processing backfill events that we never show to the client #15632

Open
matrixbot opened this issue Dec 21, 2023 · 0 comments
Open

Comments

@matrixbot
Copy link
Collaborator

matrixbot commented Dec 21, 2023

This issue has been migrated from #15632.


As discovered while working on matrix-org/synapse#15617 and a natural extension of matrix-org/synapse#15585 to process previously failed backfill events in the background (follow-up to matrix-org/synapse#13623),

It turns out, when we make a /messages request and trigger a backfill, Synapse will happily waste time trying hard to process events (_process_pulled_events) in the foreground that we end up never showing to clients. The one particularly bad culprit is the org.matrix.dummy_event which Synapse automatically puts in the DAG to resolve outstanding forward extremities. Since it has 10x random prev_events, it takes a while to fetch the state for each prev_event and persist. We determine whether some events should be shown to clients in the _check_filter_send_to_client(...) function

/~https://github.com/matrix-org/synapse/blob/ad50510a06d035a674f0eeed5db5dd3060bc0b1c/synapse/visibility.py#L482-L519

org.matrix.dummy_event can be particularly bad because they usually include a lot of disparate prev_events which take a while for us to work out the state for (long _get_state_groups_from_groups times).

Potential solution

We should use this same _check_filter_send_to_client(...) criteria when determining whether we should backfill a given event in the background or foreground.

We should be mindful an event that is filtered from the client, might still be used as a prev_event from another event in the backfill response that we will show to clients. And if we kick off the function to process org.matrix.dummy_event in the background, while the other event in the foreground, we will probably end up duplicating the work depending on the race.

We don't really have these same race -> duplicate work concerns with matrix-org/synapse#15585 (events which previously failed to backfill) because it's a "fool me once" sort of situation and they are already potentially disconnected from what we were trying to backfill anyway given they failed before.


Also of interest, filter_events_for_server(...)

@matrixbot matrixbot changed the title Dummy issue Don't waste time processing backfill events that we never show to the client Dec 21, 2023
@matrixbot matrixbot reopened this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant