Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a query param participant to admin endpoint /members to filter for participants in a room #18040

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/18040.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add a query param `participant` to `/members` admin api to filter for room participants.
7 changes: 7 additions & 0 deletions docs/admin_api/rooms.md
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,13 @@ The API is:
GET /_synapse/admin/v1/rooms/<room_id>/members
```

**Parameters**

The following query parameters are available:

* `participant` - Optional. If provided and set to true, only returns currently joined members who have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `participant` - Optional. If provided and set to true, only returns currently joined members who have
* `participant` (bool) - Optional. If provided and set to true, only returns currently joined members who have

also posted to the room. Defaults to false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
also posted to the room. Defaults to false.
also posted to the room. Having "posted" is defined as sending an `m.room.message`
event at some point in the room's history. Defaults to false.

anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved

A response body like the following is returned:

```json
Expand Down
7 changes: 6 additions & 1 deletion synapse/rest/admin/rooms.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,12 @@ async def on_GET(
if not room:
raise NotFoundError("Room not found")

members = await self.store.get_users_in_room(room_id)
participant = parse_boolean(request, "participant", False)

if participant:
members = await self.store.get_participants_in_room(room_id)
else:
members = await self.store.get_users_in_room(room_id)
ret = {"members": members, "total": len(members)}

return HTTPStatus.OK, ret
Expand Down
28 changes: 28 additions & 0 deletions synapse/storage/databases/main/roommember.py
Original file line number Diff line number Diff line change
Expand Up @@ -1606,6 +1606,34 @@ def _get_rooms_for_user_by_join_date_txn(
from_ts,
)

async def get_participants_in_room(self, room_id: str) -> Sequence[str]:
"""
Return a list of all currently joined room members who have posted
in the given room.

Args:
room_id: room ID of the room to search in
"""

def _get_participants_in_room_txn(
txn: LoggingTransaction, room_id: str
) -> List[str]:
sql = """
SELECT DISTINCT c.state_key
FROM current_state_events AS c
INNER JOIN events AS e USING(room_id)
WHERE room_id = ?
AND c.membership = 'join'
AND e.type = 'm.room.message'
AND c.state_key = e.sender
Comment on lines +1622 to +1628
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this query is currently quite slow. Testing this on matrix.org with Matrix HQ as the room, this query took about 50 seconds to run. It would be much slower on typical homeserver hardware. Postgres' EXPLAIN command measures the cost at cost=438880.86..440110.15 rows=122929 width=29.

Rather than iterating through current_state_events for room membership, the room_memberships table could be used to check which users are currently joined to a given room. We can rewrite the query as follows:

SELECT DISTINCT rm.user_id
FROM room_memberships AS rm
INNER JOIN events AS e USING(room_id)
WHERE room_id = ?
    AND rm.membership = 'join'
    AND e.type = 'm.room.message'
    AND rm.user_id = e.sender;

This cuts our cost down a little bit, but still not much; psql rates the cost as cost=438882.94..440112.23 rows=122929 width=29, and still taking about 30 seconds.

The majority of the cost comes from scanning events for m.room.message-type events from the given room:

               ->  Parallel Bitmap Heap Scan on events e  (cost=4517.63..242635.10 rows=26950 width=58) (actual time=409.882..2154.587 rows=142547 loops=3)
                     Recheck Cond: (room_id = '!OGEhHVWSdvArJzumhm:matrix.org'::text)
                     Rows Removed by Index Recheck: 7928064
                     Filter: (type = 'm.room.message'::text)
                     Rows Removed by Filter: 138493
                     Heap Blocks: exact=31299 lossy=164255
                     ->  Bitmap Index Scan on events_room_stream  (cost=0.00..4501.46 rows=120367 width=0) (actual time=317.490..317.491 rows=843191 loops=1)
                           Index Cond: (room_id = '!OGEhHVWSdvArJzumhm:matrix.org'::text)

One possible way we could move forward is to add another column to room_memberships called participant that starts off as false, and is set to true once a user sends a message in a room. You would need a background job to populate this table from the current database, but going forwards it could be updated trivially, and read from very quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a column sounds practical given I don't think there's another way around scanning the events table for evidence of participation - in terms of this PR, I am thinking that it would make the most sense to open a separate PR to add the column and the background job to populate it, and then once that is complete add logic to this PR to read from the column - would that be a sensible approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@H-Shay that sounds right to me! #17512 may serve as a good example of adding the background job. I don't believe you'll need to bump SCHEMA_COMPAT_VERSION as we're not removing any columns/tables; only adding.

The background updates documentation is useful here, and remember that boolean columns need to be noted in the port script.

You may find that you want to block the Admin API from returning any results until the background update is done - lest you get incomplete results. Up to you though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is #18068

"""
txn.execute(sql, (room_id,))
return [r[0] for r in txn]

return await self.db_pool.runInteraction(
"_get_participants_in_room_txn", _get_participants_in_room_txn, room_id
)


class RoomMemberBackgroundUpdateStore(SQLBaseStore):
def __init__(
Expand Down
76 changes: 76 additions & 0 deletions tests/rest/admin/test_room.py
Original file line number Diff line number Diff line change
Expand Up @@ -2018,6 +2018,82 @@ def test_room_members(self) -> None:
)
self.assertEqual(channel.json_body["total"], 3)

def test_room_members_participants_param(self) -> None:
"""
Test that /_synapse/admin/v1/rooms/{room_id}/members with a query param value of
participant=true returns a list of currently joined users who have posted, and
that the list does not contain room members who have not posted
"""

participant = self.register_user("participant", "pass")
participant_tok = self.login("participant", "pass")
room_id = self.helper.create_room_as(participant, tok=participant_tok)

participant2 = self.register_user("participant2", "pass")
participant2_tok = self.login("participant2", "pass")
self.helper.join(room_id, participant2, tok=participant2_tok)

# have some lurkers join the room
for i in range(5):
user = self.register_user(f"user{i}", "pass")
user_tok = self.login(f"user{i}", "pass")
self.helper.join(room_id, user, tok=user_tok)

# double-check the room membership total, should be 7
channel = self.make_request(
"GET",
f"/_synapse/admin/v1/rooms/{room_id}/members",
access_token=self.admin_user_tok,
)
self.assertEqual(200, channel.code)
self.assertEqual(channel.json_body["total"], 7)

# participants send messages
self.helper.send_event(
room_id,
"m.room.message",
content={
"msgtype": "m.text",
"body": "nefarious activity",
},
tok=participant_tok,
)

self.helper.send_event(
room_id,
"m.room.message",
content={
"msgtype": "m.text",
"body": "more nefarious activity",
},
tok=participant2_tok,
)

channel = self.make_request(
"GET",
f"/_synapse/admin/v1/rooms/{room_id}/members?participant=true",
access_token=self.admin_user_tok,
)
self.assertEqual(200, channel.code)
participants = channel.json_body["members"]
self.assertEqual(channel.json_body["total"], 2)
self.assertIn(participant, participants)
self.assertIn(participant2, participants)

# have participants leave room
self.helper.leave(room_id, participant, tok=participant_tok)
self.helper.leave(room_id, participant2, tok=participant2_tok)

# there should no longer be any active participants
channel = self.make_request(
"GET",
f"/_synapse/admin/v1/rooms/{room_id}/members?participant=true",
access_token=self.admin_user_tok,
)
self.assertEqual(200, channel.code)
participants = channel.json_body["total"]
self.assertEqual(participants, 0)

def test_room_state(self) -> None:
"""Test that room state can be requested correctly"""
# Create two test rooms
Expand Down
Loading