Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(iroh-sync): Add actor to iroh-sync, remove deadlocks (#1612)
## Description * Add an actor to `iroh_sync` that processes async requests for operations on Replicas * Major refactoring of how we access the iroh sync API from iroh: Before this our `live::Actor` would easily deadlock, because inserting entries from gossip, and processing insert events coming back from the replica, happened in the same actor loop. Also, because iroh-sync is fully synchronous, we have to take care to not lock or block the async executor. So, to solve both of these issues, all operations on replicas and the iroh sync store are moved into a separate actor which runs in a loop on a `std::thread` and just blocks while waiting for actions from a channel. This, in turn, moves some state keeping from iroh to iroh-sync, which is a good thing I think. * Remove Clone and all locks from `Replica`. As they are now usually owned by the `iroh_sync::actor` actor, no need for locks. We can get unique mut refs instead. * Change the subscription model on `Replica`s to be a `Vec<flume::Sender<Event>>`. This now allows to directly subscribe to replica events from multiple places. This allows us to have subscription to replica events from the client not go through any actor. The replica events are merged together with the events from the `sync_engine::live` actor, and that's it. Should improve latency and ease work on the actors. Because we merge the channels for replica events and live actor events for client subscriptions, this means, that closing a replica from the client does not end the subscription stream, because the subscription on the live actor will remain active. To end the subscription, the client would need to drop the receiver. I think this is fine for now. What we might want to do instead is to split the subscription from the client to two methods and channels, `subscribe_insert_events` and `subscribe_sync_events` or so, then we could close them separately. However the one for sync events would still stay open indefinitely, because you wouldn't want to drop and recreate, I think, in case you do `join` / `leave` / `join`. Or do we? * Track the open state of Replicas in the `iroh_sync::actor`. I opted for a simple implementation to start: The actor counts calls to `open` and decrements on calls to `close` and closes the replica once the count reaches zero. This works fine, however for replicas opened from the RPC client, because there's no async drop, I spawn a tokio task in drop to send the close call to the node. This means that if a client is force-killed, the replica would remain open indefinitely. It would be better to solve this cleaner - the only idea I had so far was to give out something like `ReplicaDescriptor`s on `open` and then send require regular keep-alive calls from the RPC client. I think it's fine to defer this change to a followup (which will be straightforward with the architecture in place now) because the impl in this PR is already quite an improvement over the state in `main` or #1612, where we don't ever close replicas. * event subscriptions for replica insert events are now handled directly in the `iroh_sync::actor`, which is much nicer IMO. For the RPC subscription this event stream is merged with a subscription for sync events from the `iroh::sync_engine::live` actor. ## Notes & open questions Replicas may remain open indefinitely if the RPC client dies without calling `Doc::close`. This should be fixed in a followup and will need work in quic-rpc to get a notification once connections close. ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [x] Tests if relevant. --------- Co-authored-by: Friedel Ziegelmayer <me@dignifiedquire.com>
- Loading branch information