Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

LockStore: fix acquiring a lock via LockStore.try_acquire_lock #12832

Merged
merged 3 commits into from
May 30, 2022

Conversation

sumnerevans
Copy link
Contributor

@sumnerevans sumnerevans commented May 21, 2022

This PR fixes acquiring a lock via LockStore.try_acquire_lock which is currently broken.

Outstanding questions:

  • This change causes a regression in LockTestCase.test_drop and I'm not quite sure what to do about it. It's not obvious where the error is coming from (seems to be in the internals of twisted which seems concerning):
old error
$ trial tests.storage.databases.main.test_lock
tests.storage.databases.main.test_lock
  LockTestCase
    test_acquire_contention ...                                            [OK]
    test_drop ...                                                       [ERROR]
    test_maintain_lock ...                                                 [OK]
    test_shutdown ...                                                      [OK]
    test_simple_lock ...                                                   [OK]
    test_timeout_lock ...                                                  [OK]

===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/home/sumner/projects/beeper/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
builtins.RuntimeError: cannot reuse already awaited coroutine

tests.storage.databases.main.test_lock.LockTestCase.test_drop
-------------------------------------------------------------------------------
Ran 6 tests in 5.219s

FAILED (errors=1, successes=5)

Note to Reviewer

If you checkout the branch at the commit where I add the initial test case, and run the test, it will fail.

$ trial tests.storage.databases.main.test_lock 

After fast-forwarding to the HEAD of the branch, running the same tests will pass.

Sign-off

Signed-off-by: Sumner Evans <sumner@beeper.com>

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@sumnerevans sumnerevans force-pushed the lock-store-fixes branch 2 times, most recently from bf1843f to b565acf Compare May 21, 2022 22:39
@squahtx
Copy link
Contributor

squahtx commented May 23, 2022

It turns out that the test I posted is poorly written: it doesn't run the three tasks to completion.

Previously all three tasks would acquire the worker lock and be waiting for a database transaction to release the lock at the end of the test case.

After the fix, at the end of test_acquire_contention:

  • task 1 is waiting for a database transaction to release the worker lock(?)
  • task 2 holds the Linearizer lock and is waiting for a database transaction. It hasn't actually acquired the worker lock yet.
  • task 3 is waiting for the Linearizer lock and also hasn't acquired the worker lock.

Python's GC runs and throws a GeneratorExit exception inside each task. Task 3 gets the exception, resumes and finishes.
Task 2 gets the exception after and exits the Linearizer lock, which tries to resume task 3. But task 3 has already completed, so we get "RuntimeError: cannot reuse already awaited coroutine".

We can ensure all three tasks have finished like so:

--- a/tests/storage/databases/main/test_lock.py
+++ b/tests/storage/databases/main/test_lock.py
@@ -12,7 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from twisted.internet import defer
+from twisted.internet import defer, reactor
+from twisted.internet.base import ReactorBase
 from twisted.internet.defer import Deferred

 from synapse.server import HomeServer
@@ -51,9 +52,9 @@ class LockTestCase(unittest.HomeserverTestCase):
                 in_lock -= 1

         # Start 3 tasks.
-        defer.ensureDeferred(task())
-        defer.ensureDeferred(task())
-        defer.ensureDeferred(task())
+        task1 = defer.ensureDeferred(task())
+        task2 = defer.ensureDeferred(task())
+        task3 = defer.ensureDeferred(task())

         # All three are now waiting for the database.
         # Give the reactor a kick so that the database transaction returns.
@@ -61,6 +62,18 @@ class LockTestCase(unittest.HomeserverTestCase):

         release_lock.callback(None)

+        # Run the tasks to completion.
+        # To work around `Linearizer`s using a different reactor to sleep when
+        # contended (#12841), we call `runUntilCurrent` on
+        # `twisted.internet.reactor`, which is a different reactor to that used
+        # by the homeserver.
+        assert isinstance(reactor, ReactorBase)
+        self.get_success(task1)
+        reactor.runUntilCurrent()
+        self.get_success(task2)
+        reactor.runUntilCurrent()
+        self.get_success(task3)
+
         # At most one task should have held the lock at a time.
         self.assertEqual(max_in_lock, 1)

but note that the "All three are now waiting for the database." comment is no longer accurate.

@sumnerevans sumnerevans force-pushed the lock-store-fixes branch 2 times, most recently from 4dde928 to 778bc7b Compare May 23, 2022 14:43
@sumnerevans sumnerevans marked this pull request as ready for review May 23, 2022 15:15
@sumnerevans sumnerevans requested a review from a team as a code owner May 23, 2022 15:15
@squahtx
Copy link
Contributor

squahtx commented May 23, 2022

nb: currently blocks #12484

@sumnerevans sumnerevans mentioned this pull request May 23, 2022
9 tasks
@MadLittleMods MadLittleMods added the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label May 24, 2022
sumnerevans and others added 3 commits May 27, 2022 10:49
This commit proves that there is a bug in LockStore.try_acquire_lock,
and provides a test case that must pass.

Signed-off-by: Sumner Evans <sumner@beeper.com>
Co-authored-by: Sean Quah <seanq@matrix.org>
This fixes the "test_acquire_contention" test case added in the previous
commit.

Signed-off-by: Sumner Evans <sumner@beeper.com>
Signed-off-by: Sumner Evans <sumner@beeper.com>
@erikjohnston erikjohnston merged commit bda4600 into matrix-org:develop May 30, 2022
sumnerevans added a commit to sumnerevans/synapse that referenced this pull request May 30, 2022
This depends on the new newly-fixed locking mechanism from
matrix-org#12832 to prevent races.

Signed-off-by: Sumner Evans <sumner@beeper.com>
sumnerevans added a commit to sumnerevans/synapse that referenced this pull request May 30, 2022
This depends on the new newly-fixed locking mechanism from
matrix-org#12832 to prevent races.

Signed-off-by: Sumner Evans <sumner@beeper.com>
sumnerevans added a commit to sumnerevans/synapse that referenced this pull request May 30, 2022
This depends on the new newly-fixed locking mechanism from
matrix-org#12832 to prevent races.

Signed-off-by: Sumner Evans <sumner@beeper.com>
sumnerevans added a commit to sumnerevans/synapse that referenced this pull request May 31, 2022
This depends on the new newly-fixed locking mechanism from
matrix-org#12832 to prevent races.

Signed-off-by: Sumner Evans <sumner@beeper.com>
sumnerevans added a commit to sumnerevans/synapse that referenced this pull request Jun 3, 2022
This depends on the new newly-fixed locking mechanism from
matrix-org#12832 to prevent races.

Signed-off-by: Sumner Evans <sumner@beeper.com>
Fizzadar added a commit to Fizzadar/synapse that referenced this pull request Jun 15, 2022
Synapse 1.61.0 (2022-06-14)
===========================

This release removes support for the non-standard feature known both as 'groups' and as 'communities', which have been superseded by *Spaces*.

See [the upgrade notes](/~https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1610)
for more details.

Improved Documentation
----------------------

- Mention removed community/group worker endpoints in [upgrade.md](/~https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1610s). Contributed by @olmari. ([\matrix-org#13023](matrix-org#13023))

Synapse 1.61.0rc1 (2022-06-07)
==============================

Features
--------

- Add new `media_retention` options to the homeserver config for routinely cleaning up non-recently accessed media. ([\matrix-org#12732](matrix-org#12732), [\matrix-org#12972](matrix-org#12972), [\matrix-org#12977](matrix-org#12977))
- Experimental support for [MSC3772](matrix-org/matrix-spec-proposals#3772): Push rule for mutually related events. ([\matrix-org#12740](matrix-org#12740), [\matrix-org#12859](matrix-org#12859))
- Update to the `check_event_for_spam` module callback: Deprecate the current callback signature, replace it with a new signature that is both less ambiguous (replacing booleans with explicit allow/block) and more powerful (ability to return explicit error codes). ([\matrix-org#12808](matrix-org#12808))
- Add storage and module API methods to get monthly active users (and their corresponding appservices) within an optionally specified time range. ([\matrix-org#12838](matrix-org#12838), [\matrix-org#12917](matrix-org#12917))
- Support the new error code `ORG.MATRIX.MSC3823.USER_ACCOUNT_SUSPENDED` from [MSC3823](matrix-org/matrix-spec-proposals#3823). ([\matrix-org#12845](matrix-org#12845), [\matrix-org#12923](matrix-org#12923))
- Add a configurable background job to delete stale devices. ([\matrix-org#12855](matrix-org#12855))
- Improve URL previews for pages with empty elements. ([\matrix-org#12951](matrix-org#12951))
- Allow updating a user's password using the admin API without logging out their devices. Contributed by @jcgruenhage. ([\matrix-org#12952](matrix-org#12952))

Bugfixes
--------

- Always send an `access_token` in `/thirdparty/` requests to appservices, as required by the [Application Service API specification](https://spec.matrix.org/v1.1/application-service-api/#third-party-networks). ([\matrix-org#12746](matrix-org#12746))
- Implement [MSC3816](matrix-org/matrix-spec-proposals#3816): sending the root event in a thread should count as having 'participated' in it. ([\matrix-org#12766](matrix-org#12766))
- Delete events from the `federation_inbound_events_staging` table when a room is purged through the admin API. ([\matrix-org#12784](matrix-org#12784))
- Fix a bug where we did not correctly handle invalid device list updates over federation. Contributed by Carl Bordum Hansen. ([\matrix-org#12829](matrix-org#12829))
- Fix a bug which allowed multiple async operations to access database locks concurrently. Contributed by @sumnerevans @ Beeper. ([\matrix-org#12832](matrix-org#12832))
- Fix an issue introduced in Synapse 0.34 where the `/notifications` endpoint would only return notifications if a user registered at least one pusher. Contributed by Famedly. ([\matrix-org#12840](matrix-org#12840))
- Fix a bug where servers using a Postgres database would fail to backfill from an insertion event when MSC2716 is enabled (`experimental_features.msc2716_enabled`). ([\matrix-org#12843](matrix-org#12843))
- Fix [MSC3787](matrix-org/matrix-spec-proposals#3787) rooms being omitted from room directory, room summary and space hierarchy responses. ([\matrix-org#12858](matrix-org#12858))
- Fix a bug introduced in Synapse 1.54.0 which could sometimes cause exceptions when handling federated traffic. ([\matrix-org#12877](matrix-org#12877))
- Fix a bug introduced in Synapse 1.59.0 which caused room deletion to fail with a foreign key violation error. ([\matrix-org#12889](matrix-org#12889))
- Fix a long-standing bug which caused the `/messages` endpoint to return an incorrect `end` attribute when there were no more events. Contributed by @Vetchu. ([\matrix-org#12903](matrix-org#12903))
- Fix a bug introduced in Synapse 1.58.0 where `/sync` would fail if the most recent event in a room was a redaction of an event that has since been purged. ([\matrix-org#12905](matrix-org#12905))
- Fix a potential memory leak when generating thumbnails. ([\matrix-org#12932](matrix-org#12932))
- Fix a long-standing bug where a URL preview would break if the image failed to download. ([\matrix-org#12950](matrix-org#12950))

Improved Documentation
----------------------

- Fix typographical errors in documentation. ([\matrix-org#12863](matrix-org#12863))
- Fix documentation incorrectly stating the `sendToDevice` endpoint can be directed at generic workers. Contributed by Nick @ Beeper. ([\matrix-org#12867](matrix-org#12867))

Deprecations and Removals
-------------------------

- Remove support for the non-standard groups/communities feature from Synapse. ([\matrix-org#12553](matrix-org#12553), [\matrix-org#12558](matrix-org#12558), [\matrix-org#12563](matrix-org#12563), [\matrix-org#12895](matrix-org#12895), [\matrix-org#12897](matrix-org#12897), [\matrix-org#12899](matrix-org#12899), [\matrix-org#12900](matrix-org#12900), [\matrix-org#12936](matrix-org#12936), [\matrix-org#12966](matrix-org#12966))
- Remove contributed `kick_users.py` script. This is broken under Python 3, and is not added to the environment when `pip install`ing Synapse. ([\matrix-org#12908](matrix-org#12908))
- Remove `contrib/jitsimeetbridge`. This was an unused experiment that hasn't been meaningfully changed since 2014. ([\matrix-org#12909](matrix-org#12909))
- Remove unused `contrib/experiements/cursesio.py` script, which fails to run under Python 3. ([\matrix-org#12910](matrix-org#12910))
- Remove unused `contrib/experiements/test_messaging.py` script. This fails to run on Python 3. ([\matrix-org#12911](matrix-org#12911))

Internal Changes
----------------

- Test Synapse against Complement with workers. ([\matrix-org#12810](matrix-org#12810), [\matrix-org#12933](matrix-org#12933))
- Reduce the amount of state we pull from the DB. ([\matrix-org#12811](matrix-org#12811), [\matrix-org#12964](matrix-org#12964))
- Try other homeservers when re-syncing state for rooms with partial state. ([\matrix-org#12812](matrix-org#12812))
- Resume state re-syncing for rooms with partial state after a Synapse restart. ([\matrix-org#12813](matrix-org#12813))
- Remove Mutual Rooms' ([MSC2666](matrix-org/matrix-spec-proposals#2666)) endpoint dependency on the User Directory. ([\matrix-org#12836](matrix-org#12836))
- Experimental: expand `check_event_for_spam` with ability to return additional fields. This enables spam-checker implementations to experiment with mechanisms to give users more information about why they are blocked and whether any action is needed from them to be unblocked. ([\matrix-org#12846](matrix-org#12846))
- Remove `dont_notify` from the `.m.rule.room.server_acl` rule. ([\matrix-org#12849](matrix-org#12849))
- Remove the unstable `/hierarchy` endpoint from [MSC2946](matrix-org/matrix-spec-proposals#2946). ([\matrix-org#12851](matrix-org#12851))
- Pull out less state when handling gaps in room DAG. ([\matrix-org#12852](matrix-org#12852), [\matrix-org#12904](matrix-org#12904))
- Clean-up the push rules datastore. ([\matrix-org#12856](matrix-org#12856))
- Correct a type annotation in the URL preview source code. ([\matrix-org#12860](matrix-org#12860))
- Update `pyjwt` dependency to [2.4.0](/~https://github.com/jpadilla/pyjwt/releases/tag/2.4.0). ([\matrix-org#12865](matrix-org#12865))
- Enable the `/account/whoami` endpoint on synapse worker processes. Contributed by Nick @ Beeper. ([\matrix-org#12866](matrix-org#12866))
- Enable the `batch_send` endpoint on synapse worker processes. Contributed by Nick @ Beeper. ([\matrix-org#12868](matrix-org#12868))
- Don't generate empty AS transactions when the AS is flagged as down. Contributed by Nick @ Beeper. ([\matrix-org#12869](matrix-org#12869))
- Fix up the variable `state_store` naming. ([\matrix-org#12871](matrix-org#12871))
- Faster room joins: when querying the current state of the room, wait for state to be populated. ([\matrix-org#12872](matrix-org#12872))
- Avoid running queries which will never result in deletions. ([\matrix-org#12879](matrix-org#12879))
- Use constants for EDU types. ([\matrix-org#12884](matrix-org#12884))
- Reduce database load of `/sync` when presence is enabled. ([\matrix-org#12885](matrix-org#12885))
- Refactor `have_seen_events` to reduce memory consumed when processing federation traffic. ([\matrix-org#12886](matrix-org#12886))
- Refactor receipt linearization code. ([\matrix-org#12888](matrix-org#12888))
- Add type annotations to `synapse.logging.opentracing`. ([\matrix-org#12894](matrix-org#12894))
- Remove PyNaCl occurrences directly used in Synapse code. ([\matrix-org#12902](matrix-org#12902))
- Bump types-jsonschema from 4.4.1 to 4.4.6. ([\matrix-org#12912](matrix-org#12912))
- Rename storage classes. ([\matrix-org#12913](matrix-org#12913))
- Preparation for database schema simplifications: stop reading from `event_edges.room_id`. ([\matrix-org#12914](matrix-org#12914))
- Check if we are in a virtual environment before overriding the `PYTHONPATH` environment variable in the demo script. ([\matrix-org#12916](matrix-org#12916))
- Improve the logging when signature checks on events fail. ([\matrix-org#12925](matrix-org#12925))

# -----BEGIN PGP SIGNATURE-----
#
# iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmKoaa0QHGVyaWtAbWF0
# cml4Lm9yZwAKCRClQuTtGw+sCVn3B/sF8hdhrZ7hWW40ST3eG9cEKNFrj/xZXiaI
# zho3ryrxaQF68BSKot15AvZSprEdwBXWrb8WeTjyw+QH7vTKrCQDZ0p7qubn10Z7
# BuKq9hyYjyCLjBZrgy8d4U3Y8gsSByuO59YKHNLn+UTJLOs5GTH8Wprwh4mpU3Jl
# +o+cC+lMSVcyZij2hihFymcSxWq/I9WL0dsjRif8x0BUQwRXwmXc6+mhlgLBe2Zs
# 2dUouzJ8NVZcjfWvsg4noXPrNQ/IiyCVZlSIgaDftDIxVSPk5/rXiUUNex8Tn1+I
# TnOgnXhpOQD1vwVGYS8LrcfA0ubSili7xUJ8k2e5TkCjpkaVnYNu
# =q+7N
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Jun 14 11:57:49 2022 BST
# gpg:                using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09
# gpg:                issuer "erik@matrix.org"
# gpg: WARNING: server 'dirmngr' is older than us (2.2.34 < 2.3.6)
# gpg: Note: Outdated servers may lack important security fixes.
# gpg: Note: Use the command "gpgconf --kill all" to restart them.
# gpg: Can't check signature: No public key

# Conflicts:
#	docs/workers.md
#	synapse/handlers/message.py
#	synapse/handlers/pagination.py
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/push/push_rule_evaluator.py
#	synapse/rest/client/receipts.py
#	synapse/storage/databases/main/appservice.py
#	tests/push/test_push_rule_evaluator.py
@sumnerevans sumnerevans deleted the lock-store-fixes branch November 26, 2022 16:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants