remove BOM? #432

Pomax · 2018-03-29T16:35:31Z

/~https://github.com/eligrey/FileSaver.js/blob/master/src/FileSaver.js#L69 notes that the auto_bom function "prepend[s] BOM for UTF-8 XML and text/* types (including HTML)", but UTF8-encoded documents don't need a byte ordering mark, since UTF8 does not consist of "ordered bytes" like UTF16/32, but is a byte-aligned bit sequence instead, with the same ordering on all systems.

wadjeroudi · 2018-05-04T05:30:41Z

True, there shouldn't be a BOM for utf-8 documents, but that's not a big deal because you can set the flag to no_autobom when saving.

saveAs(blob, 'file.txt', true);

Pomax · 2018-05-10T13:51:05Z

flags should be reserved for overriding the expected default behaviour, so no BOM should be written unless explicitly told to do so, as per Unicode's recommendation.

jdhines · 2018-06-11T19:37:25Z

Just ran into this where we're designing a replacement system that creates a text file for a downstream process, and now that process is bombing on the file due to it reading it as utf-8-bom instead of utf-8.

mvasilkov · 2018-06-17T10:29:31Z

I second this, adding BOM should be opt-in.

jimmywarting · 2018-09-21T00:09:21Z

as per Unicode's recommendation

On what ground? source?

Pomax · 2018-09-21T17:25:06Z

http://www.unicode.org/versions/Unicode10.0.0/ch02.pdf page 40,

[...] Use of a BOM is neither required nor recommended for UTF-8, but may be encountered [...]

A statement that has been in effect since 2003 with the introduction of Unicode 4.0 (http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf, pp33), because that's when UTF8 was brought in line with UTF16 through https://tools.ietf.org/html/rfc3629, and became an official encoding scheme for Unicode data.

http://unicode.org/faq/utf_bom.html#utf8-3 gives the less formal explanation that the Byte Order Mark has no meaning for UTF8 because UTF8 isn't a byte-ordered encoding. Systems that read/write Unicode content using the UTF8 scheme must all do so in the exact same way, irrespective of their Endian-ness. But of course, the FAQ is not the authority, the spec is.

As a BOM for UTF-8 is formally both neither required nor recommended, writing one by default is essentially a bug because it contravenes the spec. Thankfully, it's an easy to fix bug, too: flip the default, spin a new major release for that single change (because it's a breaking change) and everyone wins.

eligrey · 2018-09-21T17:56:45Z

It's for browser charset sniffing because no browsers ever implemented support for the charset mime parameter in blobs, so all text/plain;charset=UTF-8 blobs are saved as ASCII by the browser without a BOM on Windows.

If a blob being saved loads as a new tab instead of a download, it will not display properly unless the charset is sniffed through the BOM.

eligrey · 2018-09-21T18:05:00Z

As a BOM for UTF-8 is formally both neither required nor recommended, writing one by default is essentially a bug because it contravenes the spec

@Pomax The problem here is that this isn't a BOM for UTF-8, it's a UTF-8 BOM for ASCII→UTF- 8 coalescation as the charset parameter is ignored in Windows.

Try the following code:

location.href = URL.createObjectURL(new Blob(["①"], {type:"text/plain;charset=UTF-8"}));

The auto-BOM code is a workaround for an OS bug.

eligrey · 2018-09-21T18:13:29Z

I should probably change the behavior to only apply this mitigation on Windows user agents, and provide a global config option to disable the behavior unless opted-in.

Pomax · 2018-09-21T18:14:33Z

I don't see auto_bom() used outside of saveAs, in which case the location.href example you give seems the wrong example: this function is invoked to save a file to the user's device, not to open documents in the browser, so as a download, the associated content type can simply always be application/octet-stream and the charset will be irrelevant.

edit: @jimmywarting also makes a good point about the content change invalidating any digests that the user might run for their content in parallel.

jimmywarting · 2018-09-21T18:15:14Z

If we automatically add BOM then we are changing the content of the source they are trying to save. a hash sum of the file wouldn't be the same as what you are downloading.

isn't the BOM only necessary when viewing it in a new tab?
...if the a[download] work properly and don't open a new tab

eligrey · 2018-09-21T18:29:16Z

@jimmywarting Correct. Also thanks for the latest PR!

jimmywarting · 2018-09-21T18:43:15Z

I begun to thing that the noAutoBom should be reversed too.
kinda feel like it dose some unexpected things.

ppl might wonder "whata heck is BOM?" ignore it and just use the first two arguments.

a change like this should be a major version update

jimmywarting added enhancement question idea labels Sep 21, 2018

jimmywarting mentioned this issue Sep 21, 2018

V2-rc1 #463

Merged

jimmywarting closed this as completed in #463 Sep 26, 2018

jimmywarting mentioned this issue Dec 5, 2018

Unicode hyphen character encoding broken/changed in 2.0.0 #504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove BOM? #432

remove BOM? #432

Pomax commented Mar 29, 2018

wadjeroudi commented May 4, 2018

Pomax commented May 10, 2018

jdhines commented Jun 11, 2018

mvasilkov commented Jun 17, 2018

jimmywarting commented Sep 21, 2018

Pomax commented Sep 21, 2018 •

edited

Loading

eligrey commented Sep 21, 2018 •

edited

Loading

eligrey commented Sep 21, 2018 •

edited

Loading

eligrey commented Sep 21, 2018

Pomax commented Sep 21, 2018 •

edited

Loading

jimmywarting commented Sep 21, 2018 •

edited

Loading

eligrey commented Sep 21, 2018

jimmywarting commented Sep 21, 2018 •

edited

Loading

remove BOM? #432

remove BOM? #432

Comments

Pomax commented Mar 29, 2018

wadjeroudi commented May 4, 2018

Pomax commented May 10, 2018

jdhines commented Jun 11, 2018

mvasilkov commented Jun 17, 2018

jimmywarting commented Sep 21, 2018

Pomax commented Sep 21, 2018 • edited Loading

eligrey commented Sep 21, 2018 • edited Loading

eligrey commented Sep 21, 2018 • edited Loading

eligrey commented Sep 21, 2018

Pomax commented Sep 21, 2018 • edited Loading

jimmywarting commented Sep 21, 2018 • edited Loading

eligrey commented Sep 21, 2018

jimmywarting commented Sep 21, 2018 • edited Loading

Pomax commented Sep 21, 2018 •

edited

Loading

eligrey commented Sep 21, 2018 •

edited

Loading

eligrey commented Sep 21, 2018 •

edited

Loading

Pomax commented Sep 21, 2018 •

edited

Loading

jimmywarting commented Sep 21, 2018 •

edited

Loading

jimmywarting commented Sep 21, 2018 •

edited

Loading