Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix webBlob test (use LFS file as example instead of Xet file) #1230

Merged
merged 4 commits into from
Feb 26, 2025

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Feb 26, 2025

CI currently fails with:

packages/hub test:browser: ⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯
packages/hub test:browser:  FAIL  src/utils/WebBlob.spec.ts > WebBlob > should lazy load a LFS file hosted on Hugging Face
packages/hub test:browser: Error: Test timed out in 5000ms.
packages/hub test:browser: If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
packages/hub test:browser: ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/1]⎯

PR expectation: CI is green.

See #1230 (comment) and #1230 (comment) for explanation.

@Wauplin Wauplin requested a review from coyotte508 as a code owner February 26, 2025 15:40
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coyotte508 it's normal this test is that long?

@Wauplin
Copy link
Contributor Author

Wauplin commented Feb 26, 2025

I genuinely haven't changed the test (only added the timeout) but now expect(webBlob).toBeInstanceOf(WebBlob); fails with AssertionError: expected Blob{} to be an instance of WebBlob 😕 :

packages/hub test:browser: expected Blob{} to be an instance of WebBlob
packages/hub test:browser:  ❯ src/utils/WebBlob.spec.ts  (4 tests | 1 failed) 15982ms
packages/hub test:browser:    ❯ src/utils/WebBlob.spec.ts > WebBlob > should lazy load a LFS file hosted on Hugging Face
packages/hub test:browser:      → expected Blob{} to be an instance of WebBlob
packages/hub test:browser:  ✓ src/utils/eventToGenerator.spec.ts  (2 tests) 306ms
packages/hub test:browser:  ✓ src/utils/promisesQueue.spec.ts  (2 tests) 35ms
packages/hub test:browser:  ✓ src/utils/sha256.spec.ts  (6 tests) 227ms
packages/hub test:browser: ⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯
packages/hub test:browser:  FAIL  src/utils/WebBlob.spec.ts > WebBlob > should lazy load a LFS file hosted on Hugging Face
packages/hub test:browser: AssertionError: expected Blob{} to be an instance of WebBlob
packages/hub test:browser:  ❯ Proxy.methodWrapper ../../../../../../../node_modules/.vite/deps/vitest___chai.js:1014:29
packages/hub test:browser:  ❯ __vi_esm_0__.it.timeout src/utils/WebBlob.spec.ts:60:19
packages/hub test:browser:      58| 
packages/hub test:browser:      59|    expect(webBlob.size).toBe(5_135_1[49](/~https://github.com/huggingface/huggingface.js/actions/runs/13547703603/job/37863320046?pr=1230#step:7:50)_760);
packages/hub test:browser:      60|    expect(webBlob).toBeInstanceOf(WebBlob);
packages/hub test:browser:        |                   ^
packages/hub test:browser:      61|    expect(webBlob).toMatchObject({ url });
packages/hub test:browser:      62|    expect(await webBlob.slice(10, 22).text()).toBe("__metadata__");
packages/hub test:browser: ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/1]⎯

@Wauplin
Copy link
Contributor Author

Wauplin commented Feb 26, 2025

Looking at

return await (await customFetch(url)).blob();
, it seems that the condition might have changed recently - certainly due to accept-ranges being changed. So we are now fetching the full file which is 4.8GB (hence the timeout) and returning a Blob instead of a WebBlob

=> certainly better to fix the root cause here rather than setting higher timeout 😞

@Wauplin Wauplin marked this pull request as draft February 26, 2025 15:57
@Wauplin
Copy link
Contributor Author

Wauplin commented Feb 26, 2025

I went down the rabbit hole and it is because content length is 1377 (instead of 4.8GB for linked size) so WebBlob downloads it fully. This was already the case for LFS files (so I don't know why it was working before) but now this is a Xet file (see CAS location). Can it be related to the xet migration?

curl --head -s https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/diffusion_pytorch_model.fp16.safetensors | grep -E "accept|content-length|location" 
content-length: 1377
location: https://cas-bridge.xethub.hf.co/xet-bridge-us/64bfcd5ff462a99a04fd1ec8/9d7f42ce006fbf21a77799c28c6aa75fa411f4f8934b78f9b3aac51d737d6b29?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250226%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250226T161236Z&X-Amz-Expires=3600&X-Amz-Signature=8e8d7cc511d8ec203516ab7a900cf131342f7bc798acd804a45279c422f61306&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27diffusion_pytorch_model.fp16.safetensors%3B+filename%3D%22diffusion_pytorch_model.fp16.safetensors%22%3B&x-id=GetObject&Expires=1740589956&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MDU4OTk1Nn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82NGJmY2Q1ZmY0NjJhOTlhMDRmZDFlYzgvOWQ3ZjQyY2UwMDZmYmYyMWE3Nzc5OWMyOGM2YWE3NWZhNDExZjRmODkzNGI3OGY5YjNhYWM1MWQ3MzdkNmIyOSoifV19&Signature=kKedUVHmBXX72EY25toyVg5ulpmyKpvnGEIqFzb-FIqP0cRODNZWb3yPkcJ%7EY3KSwE%7E1VMzTP%7E2dCF7jUTWqh5KSLwxQQPuRFTr8NRZCxhe47%7EMoY2rqm1jlQGmJzpwIN5ePrdNkYTopzlKsbwbP-ZTF90K0kLU%7E91t40Mi-NgPV9B45pQwA3ws1E%7ER8Ipme2lE8A-obaRKFMJTp0Uw3ark%7EFRhUY8d4nnPDAnYxTzJcKQzZvUOU8GmJJm5U8JMmAfSP56qiGeUTCtvGD%7Eol%7EdG2IosT0%7ECPDtWEggKmvDuuWrA35265IX32DNHZpYJFX5QobW8z8qn3kd7t-dQOPA__&Key-Pair-Id=K2L8F4GPSG1IFC
accept-ranges: bytes

@coyotte508
Copy link
Member

coyotte508 commented Feb 26, 2025

Keep in mind that in the browser it's supposed to automatically follow redirects (or in node, when not specifying redirect: "manual") so theorically the content-length header is good enough?

Maybe xet http bridge doesn't support content ranges, but that would be surprising

Edit:

curl --head -s 'https://cas-bridge.xethub.hf.co/xet-bridge-us/64bfcd5ff462a99a04fd1ec8/9d7f42ce006fbf21a77799c28c6aa75fa411f4f8934b78f9b3aac51d737d6b29?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250226%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250226T163424Z&X-Amz-Expires=3600&X-Amz-Signature=739c22b27022acf25037841aec8e2c68618895d91388439c77cc911389f5c0e2&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27diffusion_pytorch_model.fp16.safetensors%3B+filename%3D%22diffusion_pytorch_model.fp16.safetensors%22%3B&x-id=GetObject&Expires=1740591264&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MDU5MTI2NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82NGJmY2Q1ZmY0NjJhOTlhMDRmZDFlYzgvOWQ3ZjQyY2UwMDZmYmYyMWE3Nzc5OWMyOGM2YWE3NWZhNDExZjRmODkzNGI3OGY5YjNhYWM1MWQ3MzdkNmIyOSoifV19&Signature=CuiHhjYRVq45-uzuI6s28zRp2Hh5SlELudKkGpGhRzVFwKmDuu1TPUNa3bxQatGMfsRBV1RDN%7EULjPt158KDNIY6BAsanuGfcodean6n1kpYqoNU1Hi%7EFWRqRpRtJzGZGT5J0JXIg4iw3JqcCuqDdp55J2LTwvFLrRU7wbdMQHUksPwf1T7-Nme8c%7EHtCosk%7E9bdW96-Vm1C7-d4beB0%7E1BA3DZZWxe0bUkHVH8pFQ5X3G0TtD2d5I8elwkHO-2M69IyQFppPgySWM7WyOWT-fUy1b42nBefVGOukLzasi5BHeaXwBdC9hxswUEkZ9z3btPt7c8Hd64lui7pbXIBzg__&Key-Pair-Id=K2L8F4GPSG1IFC' | grep -E "accept|content-length|location" 
content-length: 5135149760

No accept-range header in head response but it's actually supported

@Wauplin
Copy link
Contributor Author

Wauplin commented Feb 26, 2025

Yep, I arrived to the same conclusion thanks! So I've updated the test to use a file not served by Xet. That should fix the CI for now. I'll report on slack about Xet files not returning "accept-ranges: bytes" (I just checked and it does support it, just the header is not set).

@Wauplin Wauplin marked this pull request as ready for review February 26, 2025 16:48
@@ -21,7 +21,7 @@ export class WebBlob extends Blob {
const customFetch = opts?.fetch ?? fetch;
const response = await customFetch(url, { method: "HEAD" });

const size = Number(response.headers.get("content-length"));
const size = Number(response.headers.get("x-linked-size") ?? response.headers.get("content-length"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed 😬 done in 922bfdf

@Wauplin Wauplin changed the title Fix timeout in CI (add higher timeout to webBlob test) Fix webBlob test (use LFS file as example instead of Xet file) Feb 26, 2025
@Wauplin Wauplin requested a review from coyotte508 February 26, 2025 16:52
@Wauplin Wauplin merged commit 51ef023 into main Feb 26, 2025
5 checks passed
@Wauplin Wauplin deleted the fix-timeout-in-ci branch February 26, 2025 16:56
@Wauplin
Copy link
Contributor Author

Wauplin commented Feb 26, 2025

(linking slack thread - private link)

@julien-c
Copy link
Member

@Pierrci @gary149 , see, an example why switch to Xet should be user-opt-in IMO

coyotte508 added a commit that referenced this pull request Feb 27, 2025
Follow up #1230 cc @Wauplin 

Since the header should be present now

Co-authored-by: Lucain <lucain@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants