-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in the fetch instrumentation when used against an infinite fetch request resulting in memory leak #4888
Comments
One possible suggestion is the close the resClone4Hook's body. That probably will resolve this leak. |
I looked into this a bit and made a reproduction for this: /~https://github.com/tildeio/otel-js-demos/tree/4888-fetch-memory-leak @shuhaowu did a good job identifying the issue, and once you see the pattern, it's pretty obvious why this would happen. IMO it would be more clear to distill the problematic pattern down in a standalone demo without involving the instrumentation code, so that's what I did. I did some digging and this was introduced in #2497. IMO, the feature was poorly motivated and is fundamentally incompatible with infinite/long streaming responses. The context here is that Now, there is perhaps an argument here that infinite/long-running streams are also fundamentally incompatible with instrumentation anyway, as they hold up the span/trace which can cause other problems. However, because this is done automatically for all This current design isn't only problematic with infinite/long-running streams. For example, I imagine [citation needed] if the body is consumed via Overall, I tend to think that #2497 was a mistake. "response body can only be read once" is a fundamental design choice of the web platform and should be expected/respected. Eagerly cloning and holding on to the response object is essentially intentionally defeating the optimizations afforded by this design choice and there is a pretty high bar to clear there, and the original feature request just didn't provide a good reason for it. IMO, it probably should just be reverted. A possible compromise is to add an extra hook that gets called with the original response object before the
I added an option to test this in my reproduction/demo app. As far as a I can tell, in Chrome at least, it doesn't do anything. But in any case, I think this is moot – the only reason that extra clone exists was to enable the hook to read the response body, and this suggestion would break that "feature". If we are walking back that "feature" anyway, we could just pass the original response object through to the hook at that point. |
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received, which crates unnecessary memory pressure on long-running response streams. Fixes open-telemetry#4888
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
What happened?
Steps to Reproduce
fetch
request to a server to stream a large amount of data. Read the data viaresponse.getReader().read()
and discard the data (i.e. do not store the data in JS memory).Expected Result
No memory leak occurs.
Actual Result
Memory leaks, browser/OS eventually kills the tab due to memory exhaustion.
Additional Details
Looking at the implementation of
patchConstructor
, we see these lines where theresponse
is cloned into 2 additional variables:opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts
Lines 351 to 352 in 2e42181
The
body
ofresClone
is read. The body data is not used by the code, but this allows the data to be freed from memory:opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts
Lines 357 to 364 in 2e42181
However, the
body
ofresClone4Hook
is never read from. This causes the browser to keep the response data in memory despite it being consumed by both the original stream by the user and theresClone
stream. TheresClone4Hook
is only used to be passed toendSpanOnSuccess
:opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts
Line 360 in 2e42181
The code within opentelemetry doesn't seem to use response.body in any way. It does seem like it passes the
response
to potentially user-defined functions.In any case enabling autoinstrumentation causes memory-leak induced browser tab crashes when used with infinite
fetch
requests. This is very difficult to debug as the memory leak is not even in the JS heap (sinceresClone4Hook
's body never got read to JS, the memory used doesn't show up in JS heap dumps), and instead is happening inside Chrome's private memory. The regular tab JS heap OOM killer doesn't even work with it. Something else in Chrome kills the tab after ~13GB of RAM usage.OpenTelemetry Setup Code
No response
package.json
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: