-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix bug in device list caching when remote users leave rooms #13749
Fix bug in device list caching when remote users leave rooms #13749
Conversation
When a remote user leaves the last room shared with the homeserver, we have to mark their device list as unsubscribed, otherwise we will hold on to a stale device list in our cache. Crucially, the device list will remain cached even after the remote user rejoins the room, which can lead to E2EE failures until the remote user's device list next changes. Signed-off-by: Sean Quah <seanq@matrix.org>
Now that we are handling device list unsubscriptions in the event persistence code, it's no longer necessary to mark device lists as unsubscribed elsewhere.
Signed-off-by: Sean Quah <seanq@matrix.org>
synapse/handlers/e2e_keys.py
Outdated
|
||
# Check that the homeserver still shares a room with all cached users. | ||
# Note that this check may be slightly racy when a remote user leaves a | ||
# room after we have fetched their cached device list. In the worst case | ||
# we will do extra federation queries for devices that we had cached. | ||
cached_users = set(remote_results.keys()) | ||
valid_cached_users = ( | ||
await self.store.get_users_server_still_shares_room_with( | ||
remote_results.keys() | ||
) | ||
) | ||
invalid_cached_users = cached_users - valid_cached_users | ||
if invalid_cached_users: | ||
# Fix up results. If we get here there may be bugs in device list | ||
# tracking. | ||
# TODO: is this actually useful? it doesn't fix the case where a | ||
# remote user rejoins a room and we query their device list | ||
# after. | ||
user_ids_not_in_cache.update(invalid_cached_users) | ||
for invalid_user_id in invalid_cached_users: | ||
remote_results.pop(invalid_user_id) | ||
logger.error( | ||
"Devices for %s were cached, but the server no longer shares " | ||
"any rooms with them. The cached device lists are stale.", | ||
invalid_cached_users, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully convinced this check is useful, because it'll only catch /keys/query
requests done while we no longer share a room with a remote user, which I wouldn't expect to happen often, or at all.
It's also slightly racy and can trigger spuriously when a remote user leaves a room while we're in the middle of handling the request.
Do we want to keep it or remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its valid for clients to do a /keys/query
for users they don't share a room with, e.g. because they're about to start a DM or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds plausible!
user_id | ||
for event_type, user_id in delta.to_delete | ||
if event_type == EventTypes.Member | ||
and not self.is_mine_id(user_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs #13746, otherwise we can raise errors when there are malformed user IDs in the room state.
4b2a14b
to
5a12cb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, other than I don't think we want to log an error?
Do we not want sentry issues when we discover we've cached a device list we shouldn't have? |
Synapse 1.68.0 (2022-09-27) =========================== Please note that Synapse will now refuse to start if configured to use a version of SQLite older than 3.27. In addition, please note that installing Synapse from a source checkout now requires a recent Rust compiler. Those using packages will not be affected. On most platforms, installing with `pip install matrix-synapse` will not be affected. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.68/upgrade.html#upgrading-to-v1680). Bugfixes -------- - Fix packaging to include `Cargo.lock` in `sdist`. ([\matrix-org#13909](matrix-org#13909)) Synapse 1.68.0rc2 (2022-09-23) ============================== Bugfixes -------- - Fix building from packaged sdist. Broken in v1.68.0rc1. ([\matrix-org#13866](matrix-org#13866)) Internal Changes ---------------- - Fix the release script not publishing binary wheels. ([\matrix-org#13850](matrix-org#13850)) - Lower minimum supported rustc version to 1.58.1. ([\matrix-org#13857](matrix-org#13857)) - Lock Rust dependencies' versions. ([\matrix-org#13858](matrix-org#13858)) Synapse 1.68.0rc1 (2022-09-20) ============================== Features -------- - Keep track of when we fail to process a pulled event over federation so we can intelligently back off in the future. ([\matrix-org#13589](matrix-org#13589), [\matrix-org#13814](matrix-org#13814)) - Add an [admin API endpoint to fetch messages within a particular window of time](https://matrix-org.github.io/synapse/v1.68/admin_api/rooms.html#room-messages-api). ([\matrix-org#13672](matrix-org#13672)) - Add an [admin API endpoint to find a user based on their external ID in an auth provider](https://matrix-org.github.io/synapse/v1.68/admin_api/user_admin_api.html#find-a-user-based-on-their-id-in-an-auth-provider). ([\matrix-org#13810](matrix-org#13810)) - Cancel the processing of key query requests when they time out. ([\matrix-org#13680](matrix-org#13680)) - Improve validation of request bodies for the following client-server API endpoints: [`/account/3pid/msisdn/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidmsisdnrequesttoken), [`/org.matrix.msc3720/account_status`](/~https://github.com/matrix-org/matrix-spec-proposals/blob/babolivier/user_status/proposals/3720-account-status.md#post-_matrixclientv1account_status), [`/account/3pid/add`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidadd), [`/account/3pid/bind`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidbind), [`/account/3pid/delete`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3piddelete) and [`/account/3pid/unbind`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidunbind). ([\matrix-org#13687](matrix-org#13687), [\matrix-org#13736](matrix-org#13736)) - Document the timestamp when a user accepts the consent, if [consent tracking](https://matrix-org.github.io/synapse/latest/consent_tracking.html) is used. ([\matrix-org#13741](matrix-org#13741)) - Add a `listeners[x].request_id_header` configuration option to specify which request header to extract and use as the request ID in order to correlate requests from a reverse proxy. ([\matrix-org#13801](matrix-org#13801)) Bugfixes -------- - Fix a bug introduced in Synapse 1.41.0 where the `/hierarchy` API returned non-standard information (a `room_id` field under each entry in `children_state`). ([\matrix-org#13506](matrix-org#13506)) - Fix a long-standing bug where previously rejected events could end up in room state because they pass auth checks given the current state of the room. ([\matrix-org#13723](matrix-org#13723)) - Fix a long-standing bug where Synapse fails to start if a signing key file contains an empty line. ([\matrix-org#13738](matrix-org#13738)) - Fix a long-standing bug where Synapse would fail to handle malformed user IDs or room aliases gracefully in certain cases. ([\matrix-org#13746](matrix-org#13746)) - Fix a long-standing bug where device lists would remain cached when remote users left and rejoined the last room shared with the local homeserver. ([\matrix-org#13749](matrix-org#13749), [\matrix-org#13826](matrix-org#13826)) - Fix a long-standing bug that could cause stale caches in some rare cases on the first startup of Synapse with replication. ([\matrix-org#13766](matrix-org#13766)) - Fix a long-standing spec compliance bug where Synapse would accept a trailing slash on the end of `/get_missing_events` federation requests. ([\matrix-org#13789](matrix-org#13789)) - Delete associated data from `event_failed_pull_attempts`, `insertion_events`, `insertion_event_extremities`, `insertion_event_extremities`, `insertion_event_extremities` when purging the room. ([\matrix-org#13825](matrix-org#13825)) Improved Documentation ---------------------- - Note that `libpq` is required on ARM-based Macs. ([\matrix-org#13480](matrix-org#13480)) - Fix a mistake in the config manual introduced in Synapse 1.22.0: the `event_cache_size` _is_ scaled by `caches.global_factor`. ([\matrix-org#13726](matrix-org#13726)) - Fix a typo in the documentation for the login ratelimiting configuration. ([\matrix-org#13727](matrix-org#13727)) - Define Synapse's compatability policy for SQLite versions. ([\matrix-org#13728](matrix-org#13728)) - Add docs for the common fix of deleting the `matrix_synapse.egg-info/` directory for fixing Python dependency problems. ([\matrix-org#13785](matrix-org#13785)) - Update request log format documentation to mention the format used when the authenticated user is controlling another user. ([\matrix-org#13794](matrix-org#13794)) Deprecations and Removals ------------------------- - Synapse will now refuse to start if configured to use SQLite < 3.27. ([\matrix-org#13760](matrix-org#13760)) - Don't include redundant `prev_state` in new events. Contributed by Denis Kariakin (@dakariakin). ([\matrix-org#13791](matrix-org#13791)) Internal Changes ---------------- - Add a stub Rust crate. ([\matrix-org#12595](matrix-org#12595), [\matrix-org#13734](matrix-org#13734), [\matrix-org#13735](matrix-org#13735), [\matrix-org#13743](matrix-org#13743), [\matrix-org#13763](matrix-org#13763), [\matrix-org#13769](matrix-org#13769), [\matrix-org#13778](matrix-org#13778)) - Bump the minimum dependency of `matrix_common` to 1.3.0 to make use of the `MXCUri` class. Use `MXCUri` to simplify media retention test code. ([\matrix-org#13162](matrix-org#13162)) - Add and populate the `event_stream_ordering` column on the `receipts` table for future optimisation of push action processing. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13703](matrix-org#13703)) - Rename the `EventFormatVersions` enum values so that they line up with room version numbers. ([\matrix-org#13706](matrix-org#13706)) - Update trial old deps CI to use Poetry 1.2.0. ([\matrix-org#13707](matrix-org#13707), [\matrix-org#13725](matrix-org#13725)) - Add experimental configuration option to allow disabling legacy Prometheus metric names. ([\matrix-org#13714](matrix-org#13714), [\matrix-org#13717](matrix-org#13717), [\matrix-org#13718](matrix-org#13718)) - Fix typechecking with latest types-jsonschema. ([\matrix-org#13724](matrix-org#13724)) - Strip number suffix from instance name to consolidate services that traces are spread over. ([\matrix-org#13729](matrix-org#13729)) - Instrument `get_metadata_for_events` for understandable traces in Jaeger. ([\matrix-org#13730](matrix-org#13730)) - Remove old queries to join room memberships to current state events. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13745](matrix-org#13745)) - Avoid raising an error due to malformed user IDs in `get_current_hosts_in_room`. Malformed user IDs cannot currently join a room, so this error would not be hit. ([\matrix-org#13748](matrix-org#13748)) - Update the docstrings for `get_users_in_room` and `get_current_hosts_in_room` to explain the impact of partial state. ([\matrix-org#13750](matrix-org#13750)) - Use an additional database query when persisting receipts. ([\matrix-org#13752](matrix-org#13752)) - Preparatory work for storing thread IDs for notifications and receipts. ([\matrix-org#13753](matrix-org#13753)) - Re-type hint some collections as read-only. ([\matrix-org#13754](matrix-org#13754)) - Remove unused Prometheus recording rules from `synapse-v2.rules` and add comments describing where the rest are used. ([\matrix-org#13756](matrix-org#13756)) - Add a check for editable installs if the Rust library needs rebuilding. ([\matrix-org#13759](matrix-org#13759)) - Tag traces with the instance name to be able to easily jump into the right logs and filter traces by instance. ([\matrix-org#13761](matrix-org#13761)) - Concurrently fetch room push actions when calculating badge counts. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13765](matrix-org#13765)) - Update the script which makes full schema dumps. ([\matrix-org#13770](matrix-org#13770)) - Deduplicate `is_server_notices_room`. ([\matrix-org#13780](matrix-org#13780)) - Simplify the dependency DAG in the tests workflow. ([\matrix-org#13784](matrix-org#13784)) - Remove an old, incorrect migration file. ([\matrix-org#13788](matrix-org#13788)) - Remove unused method in `synapse.api.auth.Auth`. ([\matrix-org#13795](matrix-org#13795)) - Fix a memory leak when running the unit tests. ([\matrix-org#13798](matrix-org#13798)) - Use partial indices on SQLite. ([\matrix-org#13802](matrix-org#13802)) - Check that portdb generates the same postgres schema as that in the source tree. ([\matrix-org#13808](matrix-org#13808)) - Fix Docker build when Rust .so has been built locally first. ([\matrix-org#13811](matrix-org#13811)) - Complement: Initialise the Postgres database directly inside the target image instead of the base Postgres image to fix building using Buildah. ([\matrix-org#13819](matrix-org#13819)) - Support providing an index predicate clause when doing upserts. ([\matrix-org#13822](matrix-org#13822)) - Minor speedups to linting in CI. ([\matrix-org#13827](matrix-org#13827)) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE1508oLYUKainYFJakD7OEIo53t0FAmMy4FoACgkQkD7OEIo5 # 3t009g/6A4S26H6NG4GM44JD9+OB25fO59m9UAWWLrePmOKsBaGXVp86scPq9epI # vQbr4Czi8WEqCJlMRxIWLXv7BL3TLXnLF1vC0wSE6YiJqrPU9vMZ0UYxWNErl8Sr # eFBpuHXDlfppQUXs903iNmXVbdTpXCVjdTEwaZmgU1/FKydgU0o90PQseb/hnegX # hcJrepL6xhcs37fP2zdlixissLQ85WE10x6h7FX+SkCEHGvkiKrqSvXsS4ZN0Scn # vGCy3GD69j/ZpRu7RczdDwwzCRerg6r7spokRK/b6pzz9sXmCyY8SrrUyEEdzveN # uEEe4A8vmR3v0sR4Ao2cGZ7zy/jq6WyrWjmfOd+hfYD9tXx/g8190RFkRQrrkYVU # jTNdD6Zom0rtENEgHuFQ8joD96MNsaq4dvDefYYpcXDOh3YZnA/fiwfmG9XZRq6u # B42RZEtUZ3sjZ3VdRb3AvhPTTrY4kiwEqVBFSUTBKEAKXQtdrsiqv2QPvQTSC5BJ # YFXMwj32E7Zi3mTYjl60yggtBAxYt49KsHL8qAq75t6A38HpnYEyEW1R3S6VDmtp # rIR5ktxyzoGvxpJ4YPUdC4A2hbaOztwPvGE3iEDiPgUdkfb6m8x3MhaWhShid4Df # v2BQu+SFIl1wXPLxjlX2rmkQMhUGD2RGYmkUgYfoWnjdTtQV10w= # =4eYj # -----END PGP SIGNATURE----- # gpg: Signature made Tue Sep 27 12:36:58 2022 BST # gpg: using RSA key D79D3CA0B61429A8A760525A903ECE108A39DEDD # gpg: Can't check signature: No public key # Conflicts: # docker/Dockerfile # poetry.lock # synapse/push/bulk_push_rule_evaluator.py # synapse/push/push_tools.py # synapse/storage/databases/main/event_push_actions.py # synapse/storage/databases/main/events_worker.py # tests/replication/slave/storage/test_events.py
When a remote user leaves the last room shared with the homeserver, we
have to mark their device list as unsubscribed, otherwise we would hold
on to a stale device list in our cache. Crucially, the device list would
remain cached even after the remote user rejoined the room, which could
lead to E2EE failures until the next change to the remote user's device
list.
Fixes #13651.
Signed-off-by: Sean Quah seanq@matrix.org
Complement tests: matrix-org/complement#459