-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove BOM? #432
Comments
True, there shouldn't be a BOM for utf-8 documents, but that's not a big deal because you can set the flag to no_autobom when saving. saveAs(blob, 'file.txt', true); |
flags should be reserved for overriding the expected default behaviour, so no BOM should be written unless explicitly told to do so, as per Unicode's recommendation. |
Just ran into this where we're designing a replacement system that creates a text file for a downstream process, and now that process is bombing on the file due to it reading it as utf-8-bom instead of utf-8. |
I second this, adding BOM should be opt-in. |
On what ground? source? |
http://www.unicode.org/versions/Unicode10.0.0/ch02.pdf page 40,
A statement that has been in effect since 2003 with the introduction of Unicode 4.0 (http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf, pp33), because that's when UTF8 was brought in line with UTF16 through https://tools.ietf.org/html/rfc3629, and became an official encoding scheme for Unicode data. http://unicode.org/faq/utf_bom.html#utf8-3 gives the less formal explanation that the Byte Order Mark has no meaning for UTF8 because UTF8 isn't a byte-ordered encoding. Systems that read/write Unicode content using the UTF8 scheme must all do so in the exact same way, irrespective of their Endian-ness. But of course, the FAQ is not the authority, the spec is. As a BOM for UTF-8 is formally both neither required nor recommended, writing one by default is essentially a bug because it contravenes the spec. Thankfully, it's an easy to fix bug, too: flip the default, spin a new major release for that single change (because it's a breaking change) and everyone wins. |
It's for browser charset sniffing because no browsers ever implemented support for the charset mime parameter in blobs, so all If a blob being saved loads as a new tab instead of a download, it will not display properly unless the charset is sniffed through the BOM. |
@Pomax The problem here is that this isn't a BOM for UTF-8, it's a UTF-8 BOM for ASCII→UTF- 8 coalescation as the charset parameter is ignored in Windows. Try the following code:
The auto-BOM code is a workaround for an OS bug. |
I should probably change the behavior to only apply this mitigation on Windows user agents, and provide a global config option to disable the behavior unless opted-in. |
I don't see edit: @jimmywarting also makes a good point about the content change invalidating any digests that the user might run for their content in parallel. |
If we automatically add BOM then we are changing the content of the source they are trying to save. a hash sum of the file wouldn't be the same as what you are downloading. isn't the BOM only necessary when viewing it in a new tab? |
@jimmywarting Correct. Also thanks for the latest PR! |
I begun to thing that the noAutoBom should be reversed too. ppl might wonder "whata heck is BOM?" ignore it and just use the first two arguments. a change like this should be a major version update |
/~https://github.com/eligrey/FileSaver.js/blob/master/src/FileSaver.js#L69 notes that the auto_bom function "prepend[s] BOM for UTF-8 XML and text/* types (including HTML)", but UTF8-encoded documents don't need a byte ordering mark, since UTF8 does not consist of "ordered bytes" like UTF16/32, but is a byte-aligned bit sequence instead, with the same ordering on all systems.
The text was updated successfully, but these errors were encountered: