Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to trim whitespace characters #1391

Open
245967906 opened this issue Sep 11, 2019 · 17 comments
Open

Add a way to trim whitespace characters #1391

245967906 opened this issue Sep 11, 2019 · 17 comments

Comments

@245967906
Copy link

Like the trim_whitespace parameter in the django rest framework.

@sloria
Copy link
Member

sloria commented Sep 11, 2019

You could use a custom field.

from marshmallow import fields, Schema


class TrimmedString(fields.String):
    def _deserialize(self, value, *args, **kwargs):
        if hasattr(value, 'strip'):
            value = value.strip()
        return super()._deserialize(value, *args, **kwargs)


class ArtistSchema(Schema):
    name = TrimmedString(required=True)


print(ArtistSchema().load({"name": " David "}))
# {'name': 'David'}

You could also use field composition if you want to trim other field types.

from marshmallow import fields, Schema


class Trim(fields.Field):
    def __init__(self, inner, *args, **kwargs):
        self.inner = inner
        super().__init__(*args, **kwargs)

    def _bind_to_schema(self, field_name, parent):
        super()._bind_to_schema(field_name, parent)
        self.inner._bind_to_schema(field_name, parent)

    def _deserialize(self, value, *args, **kwargs):
        if hasattr(value, 'strip'):
            value = value.strip()
        return self.inner._deserialize(value, *args, **kwargs)

    def _serialize(self, *args, **kwargs):
        return self.inner._serialize(*args, **kwargs)


class ArtistSchema(Schema):
    name = Trim(fields.String(), required=True)
    email = Trim(fields.Email())


print(ArtistSchema().load({"name": " David ", "email": " david@hotmail.com "}))
# {'name': 'David', 'email': 'david@hotmail.com'}

That said, I'm going to keep this issue open because this may be a common enough use case to justify adding to marshmallow core. Feedback welcome.

@sloria sloria changed the title Is there any way to trim whitespace characters on specified field? Add a way to trim whitespace characters Sep 11, 2019
@sloria
Copy link
Member

sloria commented Sep 11, 2019

Should we add a strip_whitespace parameter to Field? @lafrech @deckar01

@245967906
Copy link
Author

@sloria thanks for your quick reply, the example you gave above can solve my problem, it's great.
but as you said, adding a strip_whitespace parameter might be a better way.

@sloria
Copy link
Member

sloria commented Sep 14, 2019

Copying #1397 (comment) here for discussion:

@sloria hi, I go to see the implementation of DRF and typesystem. summarized as follows:

* DRF's CharField and DatetimeField inherit from the same base class Field, and the trim_whitespace option only works in Charfield.

* In typesystem, Datetime inherits from String, so they can all be affected by trim_whitespace which define in String.

But in marshmallow, things is different from the two implementations above.

If you implement strip_whitespace in Field according to your idea, for the field like Number, providing trim_whitespace parameter is not semantics.

So now it is a reference to DRF that ignores DatetimeField, or let Datetime inherit from String like typesystem? i can't make a choice and I think this may require you to make a decision.

This is a more troublesome issue and may take a lot of your time. Anyway, if you have any decisions or ideas, please tell me, I also hope to help this feature complete as soon as possible.

thank you!

@sloria
Copy link
Member

sloria commented Sep 14, 2019

strip_whitespace could be applied to any field that can deserialize strings, including Number (" 1 " -> 1). So I still think adding it to Field rather than String makes sense.

@245967906
Copy link
Author

strip_whitespace could be applied to any field that can deserialize strings, including Number (" 1 " -> 1). So I still think adding it to Field rather than String makes sense.

I don't think this is a convincing reason, because for number type, whether or not we provide the strip_whitespace parameter doesn't make any difference to the result.

>>> float(" 123 ")
123.0
>>>
>>> int(" 123 ")
123

@sloria
Copy link
Member

sloria commented Sep 15, 2019

Sure, Python's float and int functions implicitly strip whitespace, so perhaps Number is a bad example. But it is still relevant for any field that deserializes strings, e.g. Boolean.

@245967906
Copy link
Author

Bool types do have this problem.
So is the preferred solution now to deal with them in Field rather than consider changing the base class of some types to String?

@sloria
Copy link
Member

sloria commented Sep 15, 2019

I'm not sure. Perhaps we could have a TrimmableMixin for fields that deserialize strings. Or we could use the composable Trim implementation I posted before.

@ddcatgg
Copy link

ddcatgg commented Nov 23, 2021

Decorator version:

def trim_before(wrapped):
    def wrapper(self, value, *args, **kwargs):
        if hasattr(value, 'strip'):
            value = value.strip()
        return wrapped(self, value, *args, **kwargs)

    return wrapper


from marshmallow import fields

fields.String._deserialize = trim_before(fields.String._deserialize)

@jonbesga
Copy link

jonbesga commented Mar 23, 2022

You could use a custom field.

from marshmallow import fields, Schema


class TrimmedString(fields.String):
    def _deserialize(self, value, *args, **kwargs):
        if hasattr(value, 'strip'):
            value = value.strip()
        return super()._deserialize(value, *args, **kwargs)


class ArtistSchema(Schema):
    name = TrimmedString(required=True)


print(ArtistSchema().load({"name": " David "}))
# {'name': 'David'}

You could also use field composition if you want to trim other field types.

from marshmallow import fields, Schema


class Trim(fields.Field):
    def __init__(self, inner, *args, **kwargs):
        self.inner = inner
        super().__init__(*args, **kwargs)

    def _bind_to_schema(self, field_name, parent):
        super()._bind_to_schema(field_name, parent)
        self.inner._bind_to_schema(field_name, parent)

    def _deserialize(self, value, *args, **kwargs):
        if hasattr(value, 'strip'):
            value = value.strip()
        return self.inner._deserialize(value, *args, **kwargs)

    def _serialize(self, *args, **kwargs):
        return self.inner._serialize(*args, **kwargs)


class ArtistSchema(Schema):
    name = Trim(fields.String(), required=True)
    email = Trim(fields.Email())


print(ArtistSchema().load({"name": " David ", "email": " david@hotmail.com "}))
# {'name': 'David', 'email': 'david@hotmail.com'}

That said, I'm going to keep this issue open because this may be a common enough use case to justify adding to marshmallow core. Feedback welcome.

Thanks for the example. Is there any reason to add the condition if hasattr(value, 'strip')? As any string will have the strip attribute

@JFDValente
Copy link

Haven't you had any definition about the implementation of this validation in the lib yet?

@llucax
Copy link

llucax commented Jan 16, 2025

Wouldn't it make sense to provide a generic Transform field? Then we can use something like Transform(String(), lambda value: value.strip() for stripping or Transform(String(), lambda value: value.lower() for lower-casing.

@sloria
Copy link
Member

sloria commented Jan 16, 2025

one issue with composable fields like Transform and Trim is that they interfere with libraries that introspect fields (like marshmallow-sqlalchemy, apispec).

perhaps a better way to solve this is to allow validators to return changed values, similar to pydantic.

In its simplest form, a field validator is a callable taking the value to be validated as an argument and returning the validated value. The callable can perform checks for specific conditions (see raising validation errors) and make changes to the validated value (coercion or mutation).

that way you could easily apply a pipeline of transformations to any field.

from marshmallow import Schema, fields

def strip_whitespace(value: str) -> str:
    return value.strip()

def uppercase(value: str) -> str:
    return value.upper()

class ArtistSchema(Schema):
    name = fields.Str(validate=(strip_whitespace, uppercase))

print(ArtistSchema().load({"name": "  foo  "}))
# {"name": "FOO"}

this would be a breaking change so would need to go in 4.0, but i think it's a decent way to solve this use case

update: #2786 implements this

@llucax
Copy link

llucax commented Jan 17, 2025

Even when practical, IMHO it is confusing that a validator can mutate a value, it sounds like it should only tell if it is valid or not. If a composable field is problematic and a API change is acceptable, maybe you can accept a new transform option for fields, that would even be backwards compatible, right?

class ArtistSchema(Schema):
    name = fields.Str(transform=(strip_whitespace, uppercase))

@lafrech
Copy link
Member

lafrech commented Jan 17, 2025

This is close to the field-level pre/post processors I'm suggesting in #2786 (review).

I tend to agree with the separation of concerns argument.

Likewise, if/when we go ahead with #2039 (comment), we could be tempted to say that from a practical perspective the getter could do the transform, but what I'm saying in the comment is precisely that confusing getter and transformer was wrong.

@sloria
Copy link
Member

sloria commented Jan 17, 2025

i do think #2787 is probably the way to achieve this. hat tip to @lafrech.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants