-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor availability-recovery strategies #1457
Refactor availability-recovery strategies #1457
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alindima . I like the direction this is going. There are a few things I think can be improved and also we should add some tests to cover usage of multiple strategies.
/// Intermediate/common data that must be passed between `RecoveryStrategy`s belonging to the | ||
/// same `RecoveryTask`. | ||
pub struct State { | ||
/// Chunks received so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move the RecoveryParams
here. These shouldn't change across strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, but I purposefully only placed mutable data into the State
.
as you said, params don't change across strategies, so we only borrow them immutably. That's why I left them separate.
They are still shared across strategies, but by virtue of the RecoveryTask
struct
Thanks!
In regards to tests: I saw that there are arguably a good amount of tests that already exercise the multiple recovery paths. Sounds good for now? |
…vailability-recovery-strategies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable
@@ -222,7 +222,7 @@ impl State { | |||
sender | |||
.send_message(NetworkBridgeTxMessage::SendRequests( | |||
requests, | |||
IfDisconnected::TryConnect, | |||
IfDisconnected::ImmediateError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! according to what I read on the issue, the cumulus 0002-pov_recovery
test should be failing with the current code of the PR, right? but it's not
I'll roll back to using TryConnect
just to be safe in this PR.
I'll think about the optimisations we can do in upcoming PRs, because I'd rather have an incremental approach and keep this refactoring backwards-compatible. @eskimor @sandreim sounds good?
If so, can we have this PR merged if it looks good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sounds good. The fact that test is passing with ImmediateError is concerning, please open an issue - this needs further investigation cc @skunert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated this a bit, to me it looks like zombienet omits some of the args we pass to the collator. In the logs, I see no trace of the --reserved-only
flag that we pass in the toml. paritytech/zombienet#1360
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #1647 for tracking purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @alindima
Refactors availability-recovery strategies to allow for easily adding new hotpaths and failover mechanisms.
The new interface allows for chaining multiple
RecoveryStrategy
-es together, to cleanly express the relationship between them and share state and code where neccessary/possible:This was done in order to aid in implementing new hotpaths like systematic chunks recovery and fetching from approval checkers.
Thanks to this design, intermediate state can be shared between the strategies. For example, if the systematic chunks recovery retrieved less than the needed amount of chunks, pass them over to the next FetchChunks strategy, which will only need to recover the remaining number of chunks.
Draft example of how a systematic chunk recovery strategy would look: 667d870 (notice how easy it was to add and reuse code)
Note that this PR doesn't itself add any new strategy, it should fully preserve backwards compatiblity in terms of functionality. Follow-up PRs to add new strategies will come.