-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TextEncoder.prototype.encodeInto #26904
Comments
Should be fairly straightforward to implement. Interested in giving it a try? |
Potentially, but currently pretty busy with another higher-level project (in which I found out this perf differentiator between running my WASM in different environments). If nobody takes this before, can give it a try sometime later. Do you think it would be reasonable if I put |
Labeling it good first issue should be fine but it will require a bit deeper knowledge than a beginner might have, and would likely require a bit of C++ |
Yeah, depends whether that targets beginners in general or just beginners to the project... I think I can put "good first issue" + "C++" to make it more clear. |
One difference with |
Node.js doesn't currently implement `TextEncoder::encodeInto`. I've raised an upstream issue to add it - nodejs/node#26904 - but it's likely to take some time and will be available only in new releases. In the meanwhile, it's worth noting that Node.js already has `Buffer::write` which has pretty similar semantics, but doesn't require creating an intermediate view using `.subarray` and instead accepts pointer and length directly. Also, Node.js has `Buffer::byteLength` helper which allows to efficiently retrieve an encoded byte length of a string upfront, and so allows us to avoid a loop with reallocations. This change takes leverage of these methods by generating an additional Buffer-based view into the WASM memory and using it for string operations. I'm seeing up to 35% increase in performance in string-heavy library benchmarks.
going to take care of this |
This would be really helpful for anyone dealing with binary. As much as I like Buffer, it's not an option for isomorphic code. Is there any ETA on this one yet? |
@RReverser ,I tried to implement this task and I did it. but I am confused by the test case in
and the dest is a uint8array, the byteOffset is 0 and the length is 4. so I wonder the test case is right or wrong, thanks! |
@AtticusYang TextEncoder#encodeInto returns the number of code points not the byte length--"\u{1D306}" is encoded with two code points in UTF-16, and indeed four bytes in UTF-8. |
@zandaqo thank for your advice. |
Add TextEncoder.prototype.encodeInto. Refs: nodejs#26904
Add function encodeInto to TextEncoder, fix bug MessageChannel is not defined in encodeInto.any.js. Fixes: nodejs#28851 Refs: nodejs#26904
Add function encodeInto to TextEncoder, fix bug MessageChannel is not defined in encodeInto.any.js. Fixes: nodejs#28851 Refs: nodejs#26904
Add function encodeInto to TextEncoder, and add MessageChannel to the encodeInto.any.js test. Fixes: nodejs#28851 Fixes: nodejs#26904 Refs: nodejs#28862 Co-authored-by: AtticusYang <yyongtai@163.com>
Add function encodeInto to TextEncoder, and add MessageChannel to the encodeInto.any.js test. Fixes: #28851 Fixes: #26904 Refs: #28862 Co-authored-by: AtticusYang <yyongtai@163.com> PR-URL: #29524 Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
I was playing with the latest nightly build (Sep 23) and noticed an odd behavior when non-string is used as a source: const encoder = new TextEncoder();
const destination = new Uint8Array(10);
encoder.encodeInto(null, destination);
//=> TypeError: The "src" argument must be of type string. Received type object While the spec does say that the source should be a USVString, its current implementations in both Chrome and Firefox opt for converting the source into a string first instead of throwing an error. So, are we sticking with the current implementation, that is, Node.js going to differ from browsers and throw if a non-string source is provided? Also, this throwing makes less sense considering that we are ok with non-string values in const encoder = new TextEncoder();
encoder.encode(null)
//=> Uint8Array [ 110, 117, 108, 108 ] |
Add function encodeInto to TextEncoder, and add MessageChannel to the encodeInto.any.js test. Fixes: #28851 Fixes: #26904 Refs: #28862 Co-authored-by: AtticusYang <yyongtai@163.com> PR-URL: #29524 Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Is your feature request related to a problem? Please describe.
TextEncoder.prototype.encodeInto
is a new method available in latest Chrome and Firefox that allows efficient encoding of strings into a given buffer, avoiding intermediate allocation and copying.Its primary usecase is a more efficient way to pass strings to WASM (one of the most expensive things one has to do when communicating with JavaScript) and it's already leveraged by Rust wasm-bindgen where available. wasm-bindgen has support for a Node.js target, so it would be great to speed things up here too.
Describe the solution you'd like
Adding native
encodeInto
method to theTextEncoder
class already available inutil
module.Please describe the desired behavior.
MDN: https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/encodeInto
Describe alternatives you've considered
Sending PR to wasm-bindgen to wrap memory into a Node-specific
Buffer
and useBuffer.prototype.write
instead.It has very similar semantics, and produces similar speed-ups (up to 35% on some of my string-interaction-heavy WASM libraries), but I wanted to see if it's possible to solve this on the Node.js side first so that the same method could be used for all platforms.
cc @jasnell as the author of the original implementation of WHATWG Encoding API in Node.js (#13644)
The text was updated successfully, but these errors were encountered: