Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore buffer in browsers #67

Merged
merged 2 commits into from
Jan 22, 2023
Merged

Conversation

jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Jan 21, 2023

Buffer is optional, adding this will prevent eg: jsdeliver to automatically adding it for just using the Buffer keyword

https://cdn.jsdelivr.net/npm/cbor-x@1.5.0/browser.js/+esm

@kriszyp
Copy link
Owner

kriszyp commented Jan 22, 2023

Thank you for the PR, this looks good! Out of curiosity, is the second commit (using globalThis.Buffer) necessary for this as well?

@kriszyp kriszyp merged commit 38b3eed into kriszyp:master Jan 22, 2023
@jimmywarting
Copy link
Contributor Author

jimmywarting commented Jan 22, 2023

is the second commit (using globalThis.Buffer) necessary for this as well?

I'm not 100% sure, but I think (i guess) it's more preferable to use a variable and declare where Buffer comes from so that compilers sees Buffer as a variable rather then assuming that it's something coming from the global scope and therefore needs to polyfill Buffer just b/c you just simply are using the global Buffer.allocUnsafeSlow, Buffer.from etc coming from Buffer through out your code.

@jimmywarting jimmywarting deleted the ignore-buffer branch January 22, 2023 17:17
@jimmywarting
Copy link
Contributor Author

Still wish Buffer could be removed completely... although, you have already expressed that you do not want to do that #44

I'm not sure I could persuade you to drop it.
if user really want a Buffer then they could use Buffer.from(result)
I just see it as unnecessary overhead by the engine.

@kriszyp kriszyp mentioned this pull request Jan 27, 2023
@kriszyp
Copy link
Owner

kriszyp commented Jan 27, 2023

I just see it as unnecessary overhead by the engine.

The reality is that it is other way around, using Node's utf8Write function is critical for the performance of writing strings within cbor-x, and having to adapt it to Uint8Array (with utf8Write.call(uint8Array,...)) is about 12% overhead/slower for string serializations (and you really really don't want to use TextEncoder, which is about 90% slower). (And the #1 goal of cbor-x is performance.)

@jimmywarting
Copy link
Contributor Author

you really really don't want to use TextEncoder, which is about 90% slower

Recent changes in newer NodeJS by @anonrig in nodejs/node#45701 might turn this around.

utf8Write is also undocumented, so i guess it never was intended for developer to depend on this.


On another note: it seems like they made TextDecoder faster than buffer.toString() and will make Buffer.toString as fast as TextDecoder.

@kriszyp
Copy link
Owner

kriszyp commented Jan 28, 2023

The moment TextEncoder.encodeInto is faster (than Buffer.write), I will gladly switch!

utf8Write is also undocumented, so i guess it never was intended for developer to depend on this.

utf8Write is basically just Buffer.write (saving an extra call), which is documented.

@anonrig
Copy link

anonrig commented Jan 28, 2023

The moment TextEncoder.encodeInto is faster (than Buffer.write), I will gladly switch!

Can you share a benchmark of this comparison in the nodejs/performance repo? I'll be happy to take a look at it.

@kriszyp
Copy link
Owner

kriszyp commented Jan 30, 2023

Here is my script that I have been using to test, annotated with the latest perf numbers I see on my computer with the different versions of node. Node v18 actually over doubled the performance of string encoding methods (awesome!), although it was across the board, so didn't really change the relative ranking. Node v19 seems to slightly regress in performance, but not a lot:

let u = new Uint8Array(2048);
let b = Buffer.alloc(2048);
let uf8Write = Buffer.prototype.utf8Write;
let start = performance.now();
let te = new TextEncoder();
let result;
for (let i = 0; i < 10000000; i++){
	//result = b.write('hello world', 10); // 78ns in v16, 32ns in v18, 36ns in v19
	//result = b.utf8Write('hello world', 10, 0xffffff); // 78ns in v16, 32ns in v18
	//result = uf8Write.call(u, 'hello world', 10, 0xffffff); // 90ns in v16, 42ns in v18
	//result = te.encodeInto('hello world', u.subarray(10)); // 146ns in v16, 57ns in v18, 60ns in v19
	// note that this is on par with buffer.write, but is useless since you can't specify a starting index, subarray time needs to be included in a fair comparison
	//result = te.encodeInto('hello world', u); // 79ns in v16, 33ns in v18
}
console.log(performance.now() - start, result);

@anonrig Did you want me to file an issue in nodejs/performance? I would be glad to do that, if you think that would be helpful, although I hadn't really considered this a "bug", just the reality of different functions performance characteristics.

@kriszyp
Copy link
Owner

kriszyp commented Jan 30, 2023

Also, an additional data point, Bun can do the buffer.write in about 11ns, plain encodeInto in 20ns and subarray+encodeInto in 58ns (so buffer.write is actually much faster than node, subarray+encodeInto about the same as node).

@anonrig
Copy link

anonrig commented Jan 30, 2023

@anonrig Did you want me to file an issue in nodejs/performance?

I'd appreciate it if you could share your findings, and I'll make it our priority to look over them in the next meeting. (And/or you're welcome to share them at the meeting with us)

@jimmywarting
Copy link
Contributor Author

using the browserified buffer module was way slower:

let { Buffer } = await import('https://esm.sh/buffer')
let str = document.body.outerHTML.repeat(4)
let size = new Blob([str]).size + 100 // 7116
let offset = 10
let u = new Uint8Array(size)
let b = Buffer.alloc(size)
let te = new TextEncoder()

let n = 10000

console.time('write')
for (let i = 0; i < n; i++) b.write(str, 10)
console.timeEnd('write')

console.time('encodeInto subarray')
for (let i = 0; i < n; i++) te.encodeInto(str, u.subarray(10))
console.timeEnd('encodeInto subarray')
chrome 109
write: 360.074951171875 ms
encodeInto subarray: 50.158203125 ms

fyi the browserified node:buffer don't have utf8Write

@kriszyp
Copy link
Owner

kriszyp commented Feb 5, 2023

Yeah, definitely never want to use the Buffer shim, that's why cbor-x uses TextEncoder when Buffer is not available. BTW, are you also suggesting that checking for the presence of Buffer.prototype.utf8Write might be a better way of checking for a "real" (fast) Buffer implementation? (so we aren't fooled by crummy shims)

@jimmywarting
Copy link
Contributor Author

are you also suggesting that checking for the presence of Buffer.prototype.utf8Write

Nah, that was just something i took notice of when i tried running your bench test inside of a browser.

b.utf8Write('hello world', 10, 0xffffff) just throwed me an error saying that it was not a function.

but if you think that checking for utf8Write existence would be good then, sure 🤷‍♂️

@jimmywarting
Copy link
Contributor Author

Hmm, have you also tried running your benchmark with a larger payload?

when i tried a payload of 14806 bytes

then there where barely any noticeable differences

const res = await fetch('https://nodejs.org/')
const ab = (await res.arrayBuffer())
const text = await new Response(ab).text()

let u = new Uint8Array(ab.byteLength);
let b = Buffer.alloc(ab.byteLength);
let uf8Write = Buffer.prototype.utf8Write;
let te = new TextEncoder();
let n = 10000000
console.log(text.length)
console.time('write');
for (let i = 0; i < n; i++) b.write(text, 10)
console.timeEnd('write');

console.time('utf8Write');
for (let i = 0; i < n; i++) b.utf8Write(text, 10, 0xffffff)
console.timeEnd('utf8Write');

console.time('uf8Write.call');
for (let i = 0; i < n; i++) uf8Write.call(u, text, 10, 0xffffff)
console.timeEnd('uf8Write.call');

console.time('encodeInto subarray');
for (let i = 0; i < n; i++) te.encodeInto(text, u.subarray(10))
console.timeEnd('encodeInto subarray');

// note that this is on par with buffer.write, but is useless since you can't specify a starting index, subarray time needs to be included in a fair comparison
console.time('encodeInto');
for (let i = 0; i < n; i++) te.encodeInto(text, u)
console.timeEnd('encodeInto');

write: 975.146ms
utf8Write: 967.701ms
uf8Write.call: 968.02ms
encodeInto subarray: 975.427ms
encodeInto: 970.195ms

Node.js v19.4.0

@kriszyp
Copy link
Owner

kriszyp commented Feb 5, 2023

Right, once the actual C-level encoder takes over, they are all doing the same thing. This is all about measuring the overhead to get the encoding going. And real-world data has a lot of small strings, so this overhead is extremely important for real performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants