Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some servers seem to expect 'Accept-Encoding : identity' to serve Range requests #747

Closed
youennf opened this issue May 29, 2018 · 18 comments
Assignees

Comments

@youennf
Copy link
Collaborator

youennf commented May 29, 2018

As part of https://bugs.webkit.org/show_bug.cgi?id=184447, it seems https://test.shhnjk.com/media_test.html server is requiring 'Accept-Encoding : identity' to be in a request to serve range responses.
That might create some issues with multimedia loads intercepted by service workers.

@annevk
Copy link
Member

annevk commented May 30, 2018

cc @jakearchibald

@jakearchibald jakearchibald self-assigned this May 30, 2018
@jakearchibald
Copy link
Collaborator

Isn't Accept-Encoding added way after the service worker? As in step 2.15 of https://fetch.spec.whatwg.org/#http-network-or-cache-fetch

@annevk
Copy link
Member

annevk commented May 30, 2018

That's true, service workers shouldn't affect this. That would be a layering bug in WebKit and worth testing for if we don't test that already.

What exactly we have to define as far as Accept-Encoding goes is also interesting. Is identity always included by default?

@jakearchibald
Copy link
Collaborator

I'll do some research. It would be good to explicitly add Accept-Encoding around that step 2.15 in cases where we think it's needed.

@jakearchibald
Copy link
Collaborator

jakearchibald commented May 30, 2018

What browsers seem to send along with ranged requests:

  • Chrome: Accept-Encoding: identity;q=1, *;q=0.
  • Safari: Accept-Encoding: identity.
  • Firefox: No Accept-Encoding header.
  • Edge: Accept-Encoding: gzip, deflate, br.

In terms of "I don't want any encoding", Chrome's value seems redundant compared to Safari's. Firefox's lack of header seems risky. Edge's is the least unusual vs other requests. I haven't tested whether Edge & Firefox actually accept encoded data.

It seems really unusual that a server would require identity, since an uncompressed response is always allowed, unless explicitly disallowed by Accept-Encoding. If it is something that servers require, then we should see Edge (and maybe Firefox too) failing to get partial responses from these servers.

@youennf is this something you've seen a lot of servers require, or is it just the case in https://bugs.webkit.org/show_bug.cgi?id=184447?

@youennf
Copy link
Collaborator Author

youennf commented May 30, 2018

I did not make a lot of testing, SW+Media loading is currently broken in WebKit due to the stripping of the Range header.

This is very visible in Safari since media loading will fail if not receiving a 206 response. Firefox and Edge might accept 200 responses even if they sent range requests.

I guess that some let-s-try-to-serve-compressed-content server logic might kick in and drop the Range header information, even though in the end, the media content is served as is since already compressed.

@jakearchibald
Copy link
Collaborator

@youennf

I did not make a lot of testing, SW+Media loading is currently broken in WebKit due to the stripping of the Range header.

If we make spec changes due to this issue, it'd impact all range requests, not just ones going via a service worker.

Unless a notable number of servers require this header in order to serve ranged responses, I'm fine with the differences between browsers here (provided they support those encodings), although I'd encourage Chrome to align with Safari.

This is very visible in Safari since media loading will fail if not receiving a 206 response. Firefox and Edge might accept 200 responses even if they sent range requests.

Tangent: I'm planning to spec that range-requesting APIs should accept 200 responses, and somehow mark the resource as no-range-capable, so it doesn't try to make further range requests. This would apply for media and downloads. If you aren't happy with this, let me know and we can figure something out.

I guess that some let-s-try-to-serve-compressed-content server logic might kick in and drop the Range header information, even though in the end, the media content is served as is since already compressed.

Good theory. I'll see if there's a way for us to get a representative sample to test.

@youennf
Copy link
Collaborator Author

youennf commented May 30, 2018

Tangent: I'm planning to spec that range-requesting APIs should accept 200 responses, and somehow mark the resource as no-range-capable, so it doesn't try to make further range requests. This would apply for media and downloads. If you aren't happy with this, let me know and we can figure something out.

If WebKit is not aligned with this for downloads, I think we will be happy to align there.
Media loads is another story.
That said, as long as it is a SHOULD, this is probably ok.

@jakearchibald
Copy link
Collaborator

jakearchibald commented May 31, 2018

I'm going through some URLs, and yes, there are some that will only return a range if Accept-Encoding is absent, or indicates 'identity'.

However, there are some URLs that do the opposite. https://www.narita-airport.jp/files/bg.mp4 will send back a 200 unless Accept-Range: gzip, deflate, br, in which case it'll send back a range. 💩

@jakearchibald
Copy link
Collaborator

@youennf

Media loads is another story.

Hmm, I was hoping to try and align browsers when it comes to media. Is this because Safari uses a different stack it doesn't control?

@jakearchibald
Copy link
Collaborator

jakearchibald commented May 31, 2018

Ok, I have done research!

I took all the URLs from HTTP Archive's latest crawl that end in .mp3 or .mp4 and tested them with various Accept-Encodings.

Accuracy warning: A lot of these URLs seem to return different results each time. I think it can depend on whether the resource is cached by a CDN.

Repo: /~https://github.com/jakearchibald/accept-encoding-range-test.

Results: https://jakearchibald.github.io/accept-encoding-range-test/ (Chrome only due to transform streams).

3.21% of fetches fail to return a 206 response, specifically if there's a non-identity encoding specified.

0.36% of fetches fail to return a 206 response, specifically if the identity encoding is specified, or no encoding is provided. A few of these look like one-time errors, but some are reproducible.

Based on that, it seems like Safari, Firefox, & Chrome are doing something right. If we spec any of them, it should be Accept-Encoding: identity. It's the shortest, and most specific.

The repo includes a command to rerun the test for a particular URL if you want.

@annevk Should this kind of weirdness be written into the spec? 3% is a lot.

@annevk
Copy link
Member

annevk commented May 31, 2018

Yes, otherwise it's hard for Servo and similar such projects to compete. Ideally when you implement standards correctly, the web works.

@jakearchibald
Copy link
Collaborator

Ok, the spec change should be pretty small. If there's a range header, set Accept-Encoding to identity. Since it's added after service worker, it doesn't impact the privileged headers stuff.

Tests for this sort of stuff are a pain, but I should be able to reuse the stuff I created for the range tests I wrote recently.

@youennf
Copy link
Collaborator Author

youennf commented Jun 6, 2018

Reading the html version of the spec, we are currently appending 'identity' to Accept-Encoding header value.
If this header is already set, we might combine the header value, probably defeating the purpose of this PR.
Maybe there should be a note stating that there should be no Accept-Encoding header until that point or the header should be set and not appended.

@annevk
Copy link
Member

annevk commented Jun 6, 2018

@youennf there cannot be a header named Accept-Encoding until that point as that's a forbidden header name. I guess the only exception would be a specification explicitly setting it, but that'd be a bug in that specification.

@youennf
Copy link
Collaborator Author

youennf commented Jun 6, 2018

I understand this is a layering violation, and that theoretically, set and append in that case should give the exact same value.
In WebKit, our media engine will actually set this header beforehand. If we were to blindly implement the specification, we would probably end-up with 'identity, identity' as the header value for media loads. I do not feel strongly here though.

@annevk
Copy link
Member

annevk commented Jun 6, 2018

If you blindly implemented all specifications that would not happen, but I see how it can happen if you do it for one, but not the other. I think in that case it's fair that you're on your own though. You're always allowed to implement things in whatever way you want, as long as it's black box indistinguishable. And the tests will help you with such setups.

@youennf
Copy link
Collaborator Author

youennf commented Jun 6, 2018

Looking at the spec, 'set' is used in three places in the spec (preflight and Fetch API Response.redirect), where append could probably be used instead.
If append is the preferred way to modify a header list, we might want to change these places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants