-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF+8 encodings are broken #54543
Comments
Update: UTF+8 is fine. It's specifically on ASCII extended set. I tried other versions:
Our best guess is ASCII extended set UTF-8 encoded vs normal ASCII extended set are getting mixed up and corrupted |
Hi! v22.7.0 has a few known buffer issues, so could you provide a minimal reproduction so the issue can be narrowed down? Additionally, could you self-moderate your comment containing curse-words, as it may be offensive to some viewers? Edit: Thanks! Possibly a duplicate of: #54521 |
Can you check if this fixes it? #54526 |
Thanks for confirming both. I'll rollback to 22.6.0 for now. |
We also ran into this issue, took us forever to find the culprit We reproduced it by having a simple express http handler deployed on amazon app runner:
Further Info: |
I'm seeing this show up as failed PostgreSQL queries that contain an umlaut as a parameter, with a cryptic-looking error:
edit: the test case provided by @blexrob fails reliably in 22.7.0 on both Apple Silicon and Linux (22.6.0 works as expected) let i = 0;
const testStr = "jürge";
const expected = Buffer.from(testStr).toString("hex");
for(; i < 1_000_000; i++) {
const buf = Buffer.from(testStr);
const ashex = buf.toString("hex");
if (ashex !== expected) {
console.log(`Decoding changed in iteration ${i} when changing to FastWriteStringUTF8, got ${ashex}, expected ${expected}`);
break;
}
}
if(i<1_000_000) {
console.error("FAILED after %d iterations",i);
} else
console.log("PASSED after %d iterations",i); |
I'd like to remind everyone that "me too" comments only add noise to this already noisy topic. Please refrain from commenting until you have something to add to the conversation Edit: this isn't directed at any comments. This is meant to deter future "me too" comments, as they occur often with issues like this. |
I feel like one of the patches in v22.8.0 (#54560), when it lands, will resolve this issue. Once that lands, please post a comment whether it resolves this issue. Given it's current state, that could be a few days. |
I assume you have already tracked this down, but I believe the issue is basically:
|
This kept us up a couple of nights. Thank you for fixing it! |
Da chainguard kun har latest node, og som en følge av denne buggen: nodejs/node#54543 vurderer vi det dithen at vi ikke lenger vil være på chainguard. Co-authored-by: Thomas Dufourd <thomas.dufourd@nav.no>
Da chainguard kun har latest node, og som en følge av denne buggen: nodejs/node#54543 vurderer vi det dithen at vi ikke lenger vil være på chainguard. Co-authored-by: Thomas Dufourd <thomas.dufourd@nav.no>
I am not familiar with the internals of node, but I just lost a few days because of this. What gave it away was that the same request lifecycle returned intact UTF-8 string to the browser and corrupted UTF-8 to the logger service, which spins up a new worker for log transport. |
This PR has landed. Expect the release to follow shortly: |
I hope that the test coverage gets improved with the fix in 22.8 |
if this broke your data when saving to mongo, here is a script to help fix it: /~https://github.com/nicholas-long/mongo-node-fix-54543 |
An [important UTF-8 bug](nodejs/node#54543) was discovered in v22.7 and fixed in v22.8. We should only allow v22.8 to avoid this issue for end users. Also updates Node version used by various CI tooling to be compliant with the new setting
An [important UTF-8 bug](nodejs/node#54543) was discovered in v22.7 and fixed in v22.8. We should only allow v22.8 to avoid this issue for end users. Also updates Node version used by various CI tooling to be compliant with the new setting
Version
22.7.0
Platform
Subsystem
No response
What steps will reproduce the bug?
Hey everyone, I'm not sure how to reproduce but latest node can't parse UTF+8 anymore. It works for the first minute or two (or couple hours if I remove Datadog APM instrumentation) but then returns garbage on the same request. I'm just using postgres.js to fetch and nest.js for the HTTP server. No fancy buffer manipulation.
Note how
éñüçßÆ
gets corrupted.How often does it reproduce? Is there a required condition?
Restart the process, it works for sometime and then corrupts itself.
What is the expected behavior? Why is that the expected behavior?
It should keep returning the current text.
What do you see instead?
Garbage text
Additional information
No response
The text was updated successfully, but these errors were encountered: