Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the file argument to UploadFile required #1413

Merged
merged 44 commits into from
Feb 4, 2023

Conversation

adriangb
Copy link
Member

@adriangb adriangb commented Jan 14, 2022

From #1406 (comment).
I'm sticking to a subset of the changes: just making the file parameter required and having the form parser construct the file. We can do headers, size or other changes separately.

This makes the form parser responsible for creating the file.
Since UploadFile no longer has the ability to set the spooled max size, I am also adding an argument to request.form() to set that.

file: typing.BinaryIO
headers: "Headers"

def __init__(
self,
filename: str,
file: typing.Optional[typing.BinaryIO] = None,
file: typing.BinaryIO,
Copy link
Member Author

@adriangb adriangb Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make this strictly file: SpooledTemporaryFile since that's the only way it's ever used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing.BinaryIO makes more sense to me.
It's an implementation detail that we happen to use SpooledTemporaryFile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing-wise, isn't there a better type for this situation? Also... It's not an implementation detail, it's written on our docs that the file is a SpooledTemporaryFile.

Copy link
Member Author

@adriangb adriangb Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there's three options:

  1. Use IO[bytes]
  2. Use SpooledTemporaryFile directly
  3. Write a protocol for just the methods we need

Source: python/typing#829 (comment)

I don't think we should choose (2) and I think we should change our docs. Let's not lock ourselves into concrete implementations if we don't need to.

I think (1) has issues (as mentioned in that discussion and others) but is widely used so it's pretty defensible if a user had to add a # type: ignore to that parameter if they pass in some less common file-like object.

(3) is the correct option and would look like:

class FileLike(Protocol):
    def write(self, __data: bytes) -> None:  ...
    def read(self, __n: int) -> bytes:  ...
    def seek(self, __pos: int) -> None:  ...
    def close(self) -> None:  ...

I think the main con here is not the LOC or complexity, the main con is that if we want to start using other methods/attributes of IO in the future it would technically be a breaking change. But maybe that's a good thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change our docs

That's enough for me. 👍

starlette/requests.py Outdated Show resolved Hide resolved
file: typing.BinaryIO
headers: "Headers"

def __init__(
self,
filename: str,
file: typing.Optional[typing.BinaryIO] = None,
file: typing.BinaryIO,
content_type: str = "",
*,
headers: "typing.Optional[Headers]" = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're changing the (internal-ish) interface at this point, should we switch it around so that file is the only required argument?...

    def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         filename: str = "",
         content_type: str = "",
         headers: "typing.Optional[Headers]" = None,
    ):
        self.file = file
        self.filename = filename or getattr(file, 'name', '')
        self.content_type = content_type
        self.headers = headers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep makes sense. How do you feel about:

    def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         filename: typing.Optional[str] = None,
         content_type: typing.Optional[str] = None,
         headers: typing.Optional[Headers] = None,
    ):
        ...

That way you can differentiate between a filename or content-type that was an empty string or one that was missing altogether (I know "" may not be valid within the context of a file name on disk and such, but it's still possible in other contexts).

self.filename = filename or getattr(file, 'name', '')

Hmm about this. I feel like it could lead to confusion. I would rather UploadFile just blindly accept it as a parameter and assign it. I think users already have access to .file so if they want the filename of the actual file they should do UploadFile.file.name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UploadedFile.filename was sanitized, what guarantee we provide that file.name also is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it would be best to just make it a constructor parameter and not fall back to file.name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, do we want filename and content_type at all here?

They're derived from the headers, so why not switch to...

class UploadFile:
  def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         headers: typing.Optional[Headers] = None,
    ):
        ...

Copy link
Member Author

@adriangb adriangb Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be to make content_type a property and keep filename.

The main reason for this is a practical one: extracting the filename from the headers requires knowledge that the request was originally a multipart/form-data request (that is, as far as I know, the only way a client would include that in the content-disposition header). If we want UploadFile to be decoupled from form parsing, I don't think it should assume that the headers came from a multipart request.

Also not every implementer of BinaryIO has a .file attribute (e.g. BytesIO does not). So if we don't pass in that information at all then users might not be able to access it (I know in practice we always pass in a SpooledTemporaryFile, but the goal here is for that to be an implementation detail as much as possible).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - that's not unreasonable.

The one other practical case where I could see UploadFile being used is for raw binary uploads.

Eg. PUT https://www.example.com/uploads/my-file-upload.txt

In that kinda use-case it's useful having filename be something that can be provided explicitly, so that it can be derived from the URL path, rather than the HTTP headers.

So yeah, sounds good to me. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that is a use case I personally really want: /~https://github.com/adriangb/xpresso/blob/c4455b29d323df1c59808ccafb306b9e7bbd80d1/xpresso/binders/_extractors/body/file.py#L40

I'm glad you brought it up and not me, I didn't want to introduce my biases 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I already pushed these changes / implementation @tomchristie, so I think this is ready for one last round of review for any typos/nits and then maybe a ✅

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all.

It would be nice if the release could include the removal of content_type from the UploadFile, as this is a breaking change.

@adriangb
Copy link
Member Author

@tomchristie might be nice to get this into 0.18.0 since it has some breaking functionality and seems to be ready to go

@adriangb
Copy link
Member Author

I checked the docs page that we have about UploadFile and I don't think there's a need to change anything there, can you confirm? This is the page: https://www.starlette.io/requests/#request-files

I think the only thing we'd need to change is the use of SpooledTemporaryFile pending discussion in /~https://github.com/encode/starlette/pull/1413/files#r785078607

@Kludex Kludex mentioned this pull request Oct 12, 2022
@Kludex Kludex modified the milestones: Version 0.22.0, Version 0.23.0 Nov 20, 2022
Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing missing here is change our docs, as per #1413 (comment).

@Kludex Kludex modified the milestones: Version 0.23.0, Version 0.24.0 Dec 8, 2022
@Kludex Kludex mentioned this pull request Dec 12, 2022
7 tasks
@adriangb
Copy link
Member Author

We need to check if this breaks FastAPI

@adriangb
Copy link
Member Author

I was able to run FastAPI's test suite with no errors on this branch as long as I ignored new deprecation warnings (which is unrelated to this PR).

@adriangb adriangb requested a review from Kludex January 23, 2023 20:12
@adriangb
Copy link
Member Author

@Kludex you'll need to re-approve to get this merged

adriangb and others added 2 commits February 4, 2023 02:06
@Kludex
Copy link
Member

Kludex commented Feb 4, 2023

Does this PR also solves the first point on here #1888 (comment) by any chance?

@adriangb
Copy link
Member Author

adriangb commented Feb 4, 2023

I don’t think so because nothing is a context manager still. #1903 does though

@Kludex Kludex enabled auto-merge (squash) February 4, 2023 17:54
@Kludex Kludex merged commit 3697c8d into encode:master Feb 4, 2023
@adriangb adriangb deleted the make-file-strict-argument branch February 4, 2023 17:59
aminalaee pushed a commit that referenced this pull request Feb 13, 2023
Co-authored-by: Marcelo Trylesinski <marcelotryle@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Refactor code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants