Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact import section format #1514

Open
tlively opened this issue Apr 1, 2024 · 9 comments
Open

Compact import section format #1514

tlively opened this issue Apr 1, 2024 · 9 comments

Comments

@tlively
Copy link
Member

tlively commented Apr 1, 2024

The binary format of the import section is a vector of elements, each containing a module name, an item name, and the kind and type of the imported item. This means that a module that imports one thousand items from some module named "env" would repeat the string "env" one thousand times in its import section. As the number of imports goes up, this repetition becomes extremely wasteful, even if the module name is extremely short or even empty. With JS string builtins, the number of imports a module might reasonably have is about to increase dramatically, so it would be nice to be able to avoid this duplication as much as possible.

It would be possible to design an alternative import section binary format that avoided this duplication, either by grouping multiple imports from a single module together, or by using indices to refer to strings in a table, or by some other mechanism.

Potentially we could go even farther. For example, we could provide a succinct encoding for a sequence of n imports from a module, all of the same type, and with increasing decimal indices as names (i.e. "0", "1", "2", etc.). This would allow modules to declare an arbitrary number of imports in just a handful of bytes, at the cost of the design being hyper-specialized to that one particular pattern of imports.

Would there be appetite for specifying a more compact alternate encoding of the import section? Obviously we would continue supporting the existing encoding, but we could allocate a new section ID or another unused bit to differentiate the compact and existing formats.

@lukewagner
Copy link
Member

FWIW, I've also wanted to define a new strings section that allowed common strings to be factored out and referred to by stringidx anywhere an inline string literal can be used today. (We could do this uniformly and backwards-compatibly by (ab)using the existing binary encoding of string literals, which requires valid UTF-8, to allow a stringidx to be encoded as an invalid UTF-8 byte sequence.) That being said, it sounds like you might want to do even fancier things than just factoring out common strings, so maybe that requires doing something import-section-specific.

@tlively
Copy link
Member Author

tlively commented Apr 1, 2024

anywhere an inline string literal can be used today

Off the top of my head, this would be the import section, the export section, custom section names, and possibly the contents of custom sections such as the name section. Are there other locations I'm missing?

@eqrion
Copy link

eqrion commented Apr 2, 2024

I think this could make sense if it showed some good size decreases. With respect to the linked js-string-builtins issue, that discussion isn't finished yet and we may settle on something that doesn't require a huge amount of imports. So the hyper-specialization around array indices might not be necessary.

One related issue to the binary size of imports is also just the slowness of performing 'read the imports'. For the string constant use-case for imports, a lot of time is spent performing the specified two fully generic property lookups (one for modName, the other for fieldName).

We've discussed ways internally to optimize this by avoiding the repeated modName lookup through some hashing, but we weren't sure if that's technically allowed due to property lookup being observable through proxies/etc. A compact import section might make that optimization more feasible.

@lukewagner
Copy link
Member

Are there other locations I'm missing?

Yep! If we get back to adding module-linking to core wasm, then instance definitions (which supply import arguments by name) would be another case.

Reading this comment, I wondered if perhaps we could allow string constants to initialize an elem section of an externref table, in which case that could be another use of string indices.

@dschuff
Copy link
Member

dschuff commented Apr 4, 2024

Generally speaking I like this idea, and it seems very straightforward.
Back in ancient history, one reason we invested a little bit (but not too heavily) in module size was that we expected duplication of this type to be well-compressible; both by gz/brotli today and possibly even more in the future with some hypothetical improved wasm-specific compression scheme (but let's ignore that for now since it hasn't materialized and isn't on anyone's radar right now).
When considering the bang-for-buck for this idea (and for that matter, things like the binary size implications of different ideas for string builtins), we should probably also make sure we're primarily considering compressed size rather than uncompressed size.
Having said that of course, @eqrion raised a good point above, and there are still reasons that uncompressed size matters; e.g. speed and memory requirements of tools that process uncompressed files, and even memory used by JS engines if they need to keep any module bytes around to implement the JS module introspection APIs.

@lukewagner
Copy link
Member

Would it be safe to say that compact import sections would allow string constants in the js-builtins proposal to use the same wasm:... module string as all the other JS built-ins without hurting binary size, addressing the concerns folks had over

(import "'" "my string constant" (global externref))

being a bit too cute?

@eqrion
Copy link

eqrion commented Jun 12, 2024

Would it be safe to say that compact import sections would allow string constants in the js-builtins proposal to use the same wasm:... module string as all the other JS built-ins without hurting binary size, addressing the concerns folks had over

(import "'" "my string constant" (global externref))

being a bit too cute?

Yes, that's the main reason I think this would be interesting. I plan on presenting a proposal for this on 7-02 CG meeting.

In the meantime, before this exists (if it ever does exist) we'd still need an alternative for the js-string use-case, but that's being discussed there.

@fitzgen
Copy link
Contributor

fitzgen commented Jun 12, 2024

It addresses the concern for me.

(While I do think a wasm: or js: namespace would be "better" than ' aesthetically, I don't personally consider the issue to be a blocker for the js builtins proposal, and would never change a vote based on that.)

Regardless, I am in favor of developing a new, compact import section encoding either way.

@lygstate
Copy link
Contributor

lygstate commented Jan 7, 2025

FWIW, I've also wanted to define a new strings section that allowed common strings to be factored out and referred to by stringidx anywhere an inline string literal can be used today. (We could do this uniformly and backwards-compatibly by (ab)using the existing binary encoding of string literals, which requires valid UTF-8, to allow a stringidx to be encoded as an invalid UTF-8 byte sequence.) That being said, it sounds like you might want to do even fancier things than just factoring out common strings, so maybe that requires doing something import-section-specific.

only valid ascii are allowed for performance, so that for WTF16 ascii string, space can be saved for memory restricted system.

Does /~https://github.com/WebAssembly/compact-import-section related to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants