fix(request-response): Report failure when streams are at capacity #5417

oblique · 2024-05-24T13:45:04Z

Description

Fixes potential hanging issue if use relies on response or failures to make progress

Notes & open questions

We are investigating a bug in our project (eigerco/lumina#256) and @zvolin found out that when an outbound request can not be scheduled it didn't produce any errors.

Inbound requests do not need to produce this kind of error because they only get reported to the user when they successfully been scheduled.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

…treams is reached

oblique · 2024-05-24T13:57:12Z

@jxs We will appreciate it if this can make it to 0.54.

protocols/request-response/src/lib.rs

…reams-failure

jxs

LGTM thanks Yianis!

protocols/request-response/tests/error_reporting.rs

mergify · 2024-06-04T10:02:57Z

This pull request has merge conflicts. Could you please resolve them @oblique? 🙏

Approvals have been dismissed because the PR was updated after the send-it label was applied.

nazar-pc · 2024-06-04T16:50:34Z

@jxs I was under impression that 0.26.3 is already released, but it is not on crates.io 🤔

jxs · 2024-06-04T21:59:44Z

no no, sorry @nazar-pc I meant the present, that I was going to release. It's now released, see here cc @oblique

Fixes potential hanging issue if use relies on response or failures to make progress Pull-Request: libp2p#5417.

@alexggh

This PR enforces that outbound requests are finished within the specified protocol timeout. The stable2412 version running libp2p 0.52.4 contains a bug which does not track request timeouts properly: - libp2p/rust-libp2p#5429 The issue has been detected while submitting libp2p -> litep2p requests in kusama. This aims to check that pending outbound requests have not timedout. Although the issue has been fixed in libp2p, there might be other cases where this may happen. For example: - libp2p/rust-libp2p#5417 For more context see: #7076 (comment) 1. Ideally, the force-timeout mechanism in this PR should never be triggered in production. However, origin/stable2412 occasionally encounters this issue. When this happens, 2 warnings may be generated: - one warning introduced by this PR wrt force timeout terminating the request - possible one warning when the libp2p decides (if at all) to provide the response back to substrate (as mentioned by @alexggh [here](/~https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L769) and [here](/~https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L842) 2. This implementation does not propagate to the substrate service the `RequestFinished { error: .. }`. That event is only used internally by substrate to increment metrics. However, we don't have the peer information available to propagate the event properly when we force-timeout the request. Considering this should most likely not happen in production (origin/master) and that we'll be able to extract information by warnings, I would say this is a good tradeoff for code simplicity: /~https://github.com/paritytech/polkadot-sdk/blob/06e3b5c6a7696048d65f1b8729f16b379a16f501/substrate/client/network/src/service.rs#L1543 ### Testing Added a new test to ensure the timeout is reached properly, even if libp2p does not produce a response in due time. I've also transitioned the tests to using `tokio::test` due to a limitation of [CI](/~https://github.com/paritytech/polkadot-sdk/actions/runs/12832055737/job/35784043867) ``` --- TRY 1 STDERR: sc-network request_responses::tests::max_response_size_exceeded --- thread 'request_responses::tests::max_response_size_exceeded' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/time/interval.rs:139:26: there is no reactor running, must be called from the context of a Tokio 1.x runtime ``` cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Bastian Köcher <git@kchr.de>

oblique added 2 commits May 24, 2024 16:35

fix(request-response)!: Report outbound failure when max concurrent s…

d223a56

…treams is reached

fix changelog and version

e3c3bba

oblique changed the title ~~fix(request-response)!: Report outbound failure when max concurrent streams is reached~~ fix(request-response)!: Report outbound failure when max concurrent streams reached May 24, 2024

oblique changed the title ~~fix(request-response)!: Report outbound failure when max concurrent streams reached~~ fix(request-response)!: Report outbound failure when streams are at capacity May 24, 2024

oblique changed the title ~~fix(request-response)!: Report outbound failure when streams are at capacity~~ fix(request-response)!: Report failure when streams are at capacity May 24, 2024

add timeout for swarm2_config in test case

f9a42a1

jxs requested a review from thomaseizinger May 24, 2024 16:24

thomaseizinger reviewed May 25, 2024

View reviewed changes

protocols/request-response/src/lib.rs Outdated Show resolved Hide resolved

oblique mentioned this pull request May 25, 2024

fix(request-response): Avoid hanging at capacity and on dial IO errors #5419

Closed

4 tasks

oblique added 2 commits May 29, 2024 13:33

Merge remote-tracking branch 'origin/master' into fix/req-resp-max-st…

822ce25

…reams-failure

report error as OutboundError::Io

8deab25

oblique changed the title ~~fix(request-response)!: Report failure when streams are at capacity~~ fix(request-response): Report failure when streams are at capacity May 29, 2024

oblique added 2 commits May 31, 2024 11:48

Merge branch 'master' into fix/req-resp-max-streams-failure

f962a11

Merge branch 'master' into fix/req-resp-max-streams-failure

c7902f3

thomaseizinger requested a review from jxs June 1, 2024 22:00

jxs previously approved these changes Jun 3, 2024

View reviewed changes

protocols/request-response/tests/error_reporting.rs Outdated Show resolved Hide resolved

protocols/request-response/tests/error_reporting.rs Outdated Show resolved Hide resolved

protocols/request-response/tests/error_reporting.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

f23fee5

jxs added the send-it label Jun 4, 2024

Merge branch 'master' into fix/req-resp-max-streams-failure

661e140

Merge branch 'master' into fix/req-resp-max-streams-failure

762137c

jxs approved these changes Jun 4, 2024

View reviewed changes

mergify bot merged commit af42122 into libp2p:master Jun 4, 2024
72 checks passed

oblique deleted the fix/req-resp-max-streams-failure branch June 4, 2024 14:46

oblique mentioned this pull request Jun 7, 2024

bug: synchronization sometimes hangs while still reporting connected peers eigerco/lumina#256

Closed

lexnv mentioned this pull request Jan 17, 2025

net/libp2p: Enforce outbound request-response timeout limits paritytech/polkadot-sdk#7222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(request-response): Report failure when streams are at capacity #5417

fix(request-response): Report failure when streams are at capacity #5417

oblique commented May 24, 2024 •

edited

Loading

oblique commented May 24, 2024 •

edited

Loading

jxs left a comment

mergify bot commented Jun 4, 2024

nazar-pc commented Jun 4, 2024

jxs commented Jun 4, 2024

fix(request-response): Report failure when streams are at capacity #5417

fix(request-response): Report failure when streams are at capacity #5417

Conversation

oblique commented May 24, 2024 • edited Loading

Description

Notes & open questions

Change checklist

oblique commented May 24, 2024 • edited Loading

jxs left a comment

Choose a reason for hiding this comment

mergify bot commented Jun 4, 2024

nazar-pc commented Jun 4, 2024

jxs commented Jun 4, 2024

oblique commented May 24, 2024 •

edited

Loading

oblique commented May 24, 2024 •

edited

Loading