-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-http-server-request-timeouts-mixed #43465
Comments
It seems this only occurs on macOS but I am not able to reproduce locally on macOS 12.4. |
@ShogunPanda could d you take a look? |
Yup, I will take a look on Monday! |
Yes, I also tried to reproduce on my local MacOS 12.4 but never reproduced. |
I was able to repro on Mac 12.4:
Let me know if there's anything I can do to help, even if it's just being a lab rat :) |
Thanks sir! Do you mind sharing which machine (hardware I mean) were you running on and all the relevant environment information (like if you were on battery or if your machine was overloaded and so forth) |
Here's some awfully crude dumps (sorry!), printing out the value of Failed Run:
Passing Run
My passing runs are close to 2500ms after client is created, when the Could it be that the check is just too early on slow/burdened systems? Update: I'm getting a bunch of runs around the 3000ms mark when I remove
|
The times don't quite match up, which is causing this issue.
Request 2 is started at Timeline The problem is that the connectionsCheckingInterval is slightly too high. The connection checking interval could tick just after 2400ms, and then just after 2900ms. Example
The fix is to ensure connection checking interval is less than 500ms, and to decouple the check timeout from connection checking interval. I'll put up a fix for this shortly. |
parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test. The checking interval was 500ms, and the request was checked for timeout exactly 500ms after the request was expected to timeout. This led to a timing condition where the next check would occur slightly after the request was expected to timeout. fixes: nodejs#43465
parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test. The checking interval was 500ms, and the request was checked for timeout exactly 500ms after the request was expected to timeout. This led to a timing condition where the next check would occur slightly after the request was expected to timeout. This change makes the checking interval more frequent, and decouples the timeout for the check from the checking interval, otherwise the issue would persist. fixes: nodejs#43465
parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test. The checking interval was 500ms, and the request was checked for timeout exactly 500ms after the request was expected to timeout. This led to a timing condition where the next check would occur slightly after the request was expected to timeout. This change makes the checking interval more frequent, and decouples the timeout for the check from the checking interval, otherwise the issue would persist. fixes: nodejs#43465
parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test. This change makes the checking interval more frequent, and decouples the timeout for the check from the checking interval. fixes: nodejs#43465
PR-URL: nodejs/node#43597 Refs: nodejs/node#43465 Reviewed-By: Paolo Insogna <paolo@cowtech.it> Reviewed-By: LiviaMedeiros <livia@cirno.name>
Failed on Ubuntu 22 - https://ci.nodejs.org/job/node-test-commit-linuxone/40388/ |
No, this one failed on rhel8-s390x. |
Ref: nodejs#43465 PR-URL: nodejs#50227 Refs: nodejs#43465 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>
Seen this several times on osx - https://ci.nodejs.org/job/node-test-commit-osx-arm/nodes=osx11/14659/ as an example |
Ref: nodejs/node#43465 PR-URL: nodejs/node#50227 Refs: nodejs/node#43465 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>
Ref: nodejs/node#43465 PR-URL: nodejs/node#50227 Refs: nodejs/node#43465 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>
PR-URL: nodejs#45722 Fixes: nodejs#43465 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Paolo Insogna <paolo@cowtech.it> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
PR-URL: nodejs#45722 Fixes: nodejs#43465 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Paolo Insogna <paolo@cowtech.it> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Is it possible to run wireshark (or any other pcap generating network capturing software) on CI for that test and export the pcap file for debugging? Then if this test fails we can inspect what actually went wrong. I have never used Jenkins, does it have some sort of artifacts like GitHub Actions? I wonder if it actually ever failed on GitHub Actions. |
@nicksia-vgw If you're still able to reproduce the issue, could you send me a pcap file (you can use Wireshark)? |
After more than 2 years, this is still flaking the CI, and has failed 19 out of 100 recent testing runs. I think it's time to mark it as flaky instead of hoping anyone would have the time to fix it in the near future. |
This has been flaking the CI for more than 2 years with various attempts to fix without success. It has still been flaking the CI (failed 19 out of 100 recent testing CI runs). It's time to mark it as flaky. PR-URL: #56503 Refs: #43465 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
This has been flaking the CI for more than 2 years with various attempts to fix without success. It has still been flaking the CI (failed 19 out of 100 recent testing CI runs). It's time to mark it as flaky. PR-URL: #56503 Refs: #43465 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
This has been flaking the CI for more than 2 years with various attempts to fix without success. It has still been flaking the CI (failed 19 out of 100 recent testing CI runs). It's time to mark it as flaky. PR-URL: nodejs#56503 Refs: nodejs#43465 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Test
test-http-server-request-timeouts-mixed
Platform
macos
Console output
Build links
Failed PR
10 (nodejs/node#43329, nodejs/node#43380, nodejs/node#43366, nodejs/node#43455, nodejs/node#43190, nodejs/node#43176, nodejs/node#43374, nodejs/node#43363, nodejs/node#43417, nodejs/node#43216)
Appeared
test-orka-macos11-x64-1, test-nearform-macos10.15-x64-3, test-orka-macos10.15-x64-2, test-nearform-macos10.15-x64-1, test-orka-macos11-x64-2
First CI
https://ci.nodejs.org/job/node-test-pull-request/44587/
Last CI
https://ci.nodejs.org/job/node-test-pull-request/44664/
Example
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx11-x64/45608/console
Additional information
Build links info is from CI Reliability 2022-06-18
This test was introduced on 2022-05-03: #42893
This test starts flaky from 2022-05-04: nodejs/reliability#271
Query all flaky dates: /~https://github.com/nodejs/reliability/issues?q=is%3Aissue+is%3Aopen+test-http-server-request-timeouts-mixed
cc @ShogunPanda(Test author) @lpinca(Reviewer, I saw you mention the flaky possibility in pr comment)
The text was updated successfully, but these errors were encountered: