-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] At startup, send all locally stored traces instead of waiting the first batch + some time #1305
Comments
Before creating this issue, I tried to see if I could find a way to force the trace push "manually". I found 3, but this is absolutely not a good way to do it... 1st solution
--> We force the app to stay alive in a possible unstable state 2nd solution
--> We force the app to stay alive in a possible unstable state 3rd solution
|
What do you think @JacksonWeber ? I'm up for doing the PR, but I'd rather have an opinion before I start working on it. |
@Apokalypt Before you begin any work on something like this, we're migrating away from the Application Insights 2.X SDK to an OpenTelemetry based solution that will be the foundation going forward. So if you look at the behavior of /~https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/monitor/monitor-opentelemetry and find you have the same issue there (I believe this logic is the same), it would be best to discuss the suggestion in that repository. Haven't gotten much time to investigate this suggestion, but I wanted to make you aware of that change first. |
@JacksonWeber Thanks for your answer. Nevertheless, if version 3.x is close to a stable release (and the behavior is the same/similar) I'm okay with proposing an implementation directly on 3.x. Do you have an ETA of a 3.x stable release? |
@Apokalypt Version 3 is expected to release this week. |
Ok thx, I will take a look at the version 3.x to implement this feature 👍 |
Unfortunately, I don't think I'll have time to look at the implementation at the moment. Especially as I've had a quick look at the project and I have the impression that the functionality for sending missing traces following a crash has been removed... From there, I have 2 questions:
|
@Apokalypt I believe the functionality you're referring too still exists. You should be able to see how we handle persisting here. Regarding 2.X SDK maintenance, yes, the 2.X SDK will be updated in the case of security flaws to support customers on that version. I can work on testing this scenario and determine if we can prioritize it. I'll let you know if development begins on this feature. |
Context
Explication
Right now, when a NodeJS application crashes, the library gives the possibility to call the <TelemetryClient>.flush method with the
isAppCrashing
option set totrue
. This can be doneThis method tells the Sender class to store traces in a local file if disk retry caching is enabled (cf code).
Problem
The problem is that these files stored locally will only be sent after the first batch of traces (after a restart) and after having wait for the
resendInterval
(by default, set to 1 minute). You can check this yourself in the code: hereThis means that if there's a major bug (leading to a crash) in the application a few seconds after start-up, the traces will never be sent, making it difficult to analyze the problem. Even if the bug doesn't arrive so quickly, this can disrupt the analysis of the engineers, who will see the logs due to the application restart but not the crash logs (which will surely arrive 1 minute later).
Implementation expected
I'd like the library to force the push of ALL local files (not just the first one) right after client startup. This would make debugging easier when the application crashes and reboots...
At the same time, it might be a good idea to make sure that these files are only deleted if the send is successful, and not before sending (this will prevent loss of traces in the event of the application crashing just after the call has started).
Proposal
If the above implementation looks right to you, I'd be very happy to work on it and create a PR to improve the behavior as explained above. Let me know what you think
The text was updated successfully, but these errors were encountered: