Skip to content

Commit

Permalink
net/libp2p: Enforce outbound request-response timeout limits (#7222)
Browse files Browse the repository at this point in the history
This PR enforces that outbound requests are finished within the
specified protocol timeout.

The stable2412 version running libp2p 0.52.4 contains a bug which does
not track request timeouts properly:
- libp2p/rust-libp2p#5429

The issue has been detected while submitting libp2p -> litep2p requests
in kusama. This aims to check that pending outbound requests have not
timedout. Although the issue has been fixed in libp2p, there might be
other cases where this may happen. For example:
- libp2p/rust-libp2p#5417

For more context see:
#7076 (comment)


1. Ideally, the force-timeout mechanism in this PR should never be
triggered in production. However, origin/stable2412 occasionally
encounters this issue. When this happens, 2 warnings may be generated:
- one warning introduced by this PR wrt force timeout terminating the
request
- possible one warning when the libp2p decides (if at all) to provide
the response back to substrate (as mentioned by @alexggh
[here](/~https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L769)
and
[here](/~https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L842)

2. This implementation does not propagate to the substrate service the
`RequestFinished { error: .. }`. That event is only used internally by
substrate to increment metrics. However, we don't have the peer
information available to propagate the event properly when we
force-timeout the request. Considering this should most likely not
happen in production (origin/master) and that we'll be able to extract
information by warnings, I would say this is a good tradeoff for code
simplicity:


/~https://github.com/paritytech/polkadot-sdk/blob/06e3b5c6a7696048d65f1b8729f16b379a16f501/substrate/client/network/src/service.rs#L1543


### Testing

Added a new test to ensure the timeout is reached properly, even if
libp2p does not produce a response in due time.

I've also transitioned the tests to using `tokio::test` due to a
limitation of
[CI](/~https://github.com/paritytech/polkadot-sdk/actions/runs/12832055737/job/35784043867)

```
--- TRY 1 STDERR:        sc-network request_responses::tests::max_response_size_exceeded ---
thread 'request_responses::tests::max_response_size_exceeded' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/time/interval.rs:139:26:
there is no reactor running, must be called from the context of a Tokio 1.x runtime
```



cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
  • Loading branch information
lexnv and bkchr authored Jan 22, 2025
1 parent 634a17b commit fd64a1e
Show file tree
Hide file tree
Showing 2 changed files with 612 additions and 427 deletions.
19 changes: 19 additions & 0 deletions prdoc/pr_7222.prdoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: Enforce libp2p outbound request-response timeout limits

doc:
- audience: Node Dev
description: |
This PR enforces that outbound requests are finished within the specified protocol timeout.
The stable2412 version running libp2p 0.52.4 contains a bug which does not track request timeouts properly
/~https://github.com/libp2p/rust-libp2p/pull/5429.

The issue has been detected while submitting libp2p to litep2p requests in Kusama.
This aims to check that pending outbound requests have not timed out.
Although the issue has been fixed in libp2p, there might be other cases where this may happen.
For example, /~https://github.com/libp2p/rust-libp2p/pull/5417.

For more context see /~https://github.com/paritytech/polkadot-sdk/issues/7076#issuecomment-2596085096.

crates:
- name: sc-network
bump: patch
Loading

0 comments on commit fd64a1e

Please sign in to comment.