-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(instrumentation-fetch)! Passthrough original request to applyCustomAttributes
#5281
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5281 +/- ##
=======================================
Coverage 94.62% 94.62%
=======================================
Files 323 323
Lines 8068 8068
Branches 1637 1637
=======================================
Hits 7634 7634
Misses 434 434 |
Tests are failing because this PR currently targets |
applyCustomAttributes
applyCustomAttributes
Feature was originally contributed by @echoontheway. There isn't a description of the original use case in the PR /~https://github.com/open-telemetry/opentelemetry-js/pull/2497/files It has been raised several times in the past that this feature causes memory thrash and leaks. In my opinion, the motivating use case would have to be very convincing in order to keep this feature when it has caused so many issues. |
541dbe5
to
4b9a12e
Compare
Next steps (for me):
|
4b9a12e
to
ae44a47
Compare
Rebased ✅ Added this to the SIG meeting agenda tomorrow, code wise it should be good to go. |
…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks 🙌
Which problem is this PR solving?
Currently, the browser
fetch()
instrumentation unconditionally clones the response object for every request, in order to preserve the ability for theapplyCustomAttributes
to consume the response body. This cloned response doesn't get consumed/released until the body has been streamed and read entirely, which forces the browser to buffer and retain the entire response body, creating unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash.Fixes #4888
Short description of the changes
The offending code was added in #2497. This commit effectively reverts that commit and breaks the feature. This is certainly a breaking change, and we can discuss the mitigations below.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist:
Background – the browser
fetch()
APIWhen
fetch()
was added to the browser, they made some deliberate design choices/improvements:onreadystatechange
hook/event).Concretely:
fetch()
returns a promise that resolves with aResponse
object, which can happen as soon as the headers are fully received.Response
object for working with the body, all of which are async. In addition, those APIs are methods that returns the data, rather than exposing the body as a property on the response itself. This allows the browser to stream the response in the background, and have more liberty in how the implement/fulfill those APIs. For example, inawait response.json()
, a browser could choose to use a streaming JSON parser and discard/move the underlying bytes as they arrive, reducing the need for buffering/copying bytes unnecessarily.Response
object. This is necessary to achieve the goals in the previous bullet point – if you are allowed to callawait response.json()
followed byawait response.text()
sometime later, the optimization in (2) is no longer possible, and the browser must buffer the response bytes for the lifetime of the response object, and each call to those APIs will have to copy the underlying data.Response.clone()
API provides a way to opt-in to buffering/copying the response body in cases where there are multiple consumers. This must be called before the response body has been read.Background – the original feature request #2497
As mentioned above, the current instrumentation code unconditionally call
.clone()
on every response:opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts
Line 375 in 8ab52d5
It then holds onto the cloned response object, until the response body has been fully streamed/read:
opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts
Line 383 in 8ab52d5
This was introduced way back in #2497. Effectively, this code "reserves the right" to read the response body in
applyCustomAttributesOnSpan()
, which doesn't get called until the entire response is available, forcing the browser to buffer and hold onto the underlying bytes in the meantime.Note
There are actually two
.clone()
s here.resClone
predates #2497, and was added in #2203 to ensure the span timing covers the whole lifecycle of the request/response, rather than just up to the point where the headers are sent. This.clone()
is still problematic in that it forces the browser to copy the bytes as there are now more than one consumer, but since it uses the streaming-based API which discard the read bytes as soon as they are become available, it at least shouldn't contribute to the memory pressure problem which is the focus of this PR. We should probably still revisit that to see if there are better ways to accomplish this by hooking into resource timing API, but it's nontrivial and out-of-scope here. Removing the second clone alone should be sufficient to address the memory pressure problem reported in #4888.This behavior/"feature" is fundamentally unsound, as it creates unnecessary memory pressure for all
fetch()
requests. This may be negligible in small responses, but when streaming a large response body this can be especially problematic. Worse, in case of long-running response streams, this is effectively a memory leak. At the extremes, as reported in #4888, this can cause the browser tab to run against its allowed memory limit and crash.What now?
I don't see how we can reasonably support this feature without running into the fundamental problem described above. To that end, and considering that this is still an experimental package, I suggest that we drop it the feature and send it back to the drawing board.
The original feature request was too sparse on details. It was unclear what it was trying to accomplish – specifically why would it be useful to be able to read the response body in
applyCustomAttributes()
. I could speculate on the use cases, but I also think the design of the feature is itself flawed –applyCustomAttributes()
is meant to be a synchronous hook, but by design, doing anything with the response body necessitates async operations. This means code like the test added in #2497 is also fundamentally broken, and probably only working due to this internal delay, which is certainly unintentional/unreliable. Future refactors (e.g. the fix for #5122) could very well remove the need for this delay and the code will subtly break.So, I think the best course of action is to open an issue representing the breakage/feature request, and make a note in the CHANGELOG pointing to the issue, so we can evaluate what, if any, the replacement APIs should look like.