Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp/mv Operations Fail for Large Objects with s3fs and fsspec #916

Closed
b23g5r42i opened this issue Nov 21, 2024 · 4 comments · Fixed by #921
Closed

cp/mv Operations Fail for Large Objects with s3fs and fsspec #916

b23g5r42i opened this issue Nov 21, 2024 · 4 comments · Fixed by #921

Comments

@b23g5r42i
Copy link

Problem Description

When attempting to copy or move large objects (e.g., 5GB) using the cp or mv operations with s3fs (via fsspec), the operation silently fails or produces an error. This behavior is inconsistent and problematic for workflows requiring large file handling in AWS S3.

Steps to Reproduce

Below is a minimal reproducible example to demonstrate the issue:

import os
from upath import UPath

def create_large_file(file_path, size_gb=5):
    """Creates a large file of specified size locally."""
    with open(file_path, "wb") as f:
        chunk_size = 1024 * 1024  # 1MB
        total_chunks = size_gb * 1024  # Total number of chunks to create the file
        for _ in range(total_chunks):
            f.write(os.urandom(chunk_size))  # Write random bytes to the file
    print(f"Created large file of size {size_gb}GB at {file_path}")

# Step 1: Create a large local file (e.g., 5GB)
local_file = "large_file_5gb.dat"
create_large_file(local_file, size_gb=5)

# Step 2: Define source and target paths using UPath with s3fs
source = UPath("s3://MY_BUCKET/source/large_file_5gb.arrow")
target = UPath("s3://MY_BUCKET/target/large_file_5gb.arrow")

# Step 3: Upload the local file to S3
source.write_bytes(UPath(local_file).read_bytes())  # This works for small files and large files alike.

# Step 4: Attempt to copy the large file within S3
try:
    source.fs.cp(str(source), str(target), recursive=True)
    print("Copy operation succeeded.")
except Exception as e:
    print(f"Copy operation failed: {e}")

Expected Behavior

The cp operation should successfully copy the file from the source path to the target path in S3, regardless of the file size.

Observed Behavior

For large files (e.g., 5GB or more), the cp or mv operation fails with error: Read Timeout on endpoint URL: https://THE_OBJ_URL_S3

@martindurant
Copy link
Member

What version of s3fs do you have, please?

@b23g5r42i
Copy link
Author

b23g5r42i commented Nov 23, 2024 via email

@martindurant
Copy link
Member

(sorry, I haven't got around to investigating this, but I have not forgotten)

@martindurant
Copy link
Member

I can confirm the problem, working on this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants