Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support encoding UUIDs as bytes #493

Closed
Alviner opened this issue Jul 26, 2023 · 6 comments · Fixed by #499
Closed

Support encoding UUIDs as bytes #493

Alviner opened this issue Jul 26, 2023 · 6 comments · Fixed by #499

Comments

@Alviner
Copy link

Alviner commented Jul 26, 2023

Description

Hello! Thanks for your hard work. We have a go package with broken msgpack library (it pack uuid in bytes eg).

We want to move on your library, but as I see all uuid subtypes are packed as string. /~https://github.com/jcrist/msgspec/blob/main/msgspec/_core.c#L11748

Are there any chances that something will keep options open?

@Alviner Alviner changed the title Custom enc/dec hook for standard types Custom enc/dec hook for subtypes of standard types Jul 26, 2023
@jcrist
Copy link
Owner

jcrist commented Jul 26, 2023

There currently isn't a way to override how msgspec will encode/decode types that are natively supported. Most types support overriding how subclasses are encoded/decoded, but for ease of integration with asyncpg (which uses a uuid subclass) you can't currently override how uuids are encoded.

A few questions:

  • Are UUIDs the only natively supported type that you want to override?
  • Do only want to decode UUIDs from this go msgpack library, or do you also need to encode UUIDs the same way?
  • How does the go library encode UUIDs? We might just add builtin support for decoding them from this alternative representation.
In [8]: u  # an example uuid
Out[8]: UUID('9b65a26c-d67e-445f-b29e-a3881734a701')

In [9]: u.hex.encode()  # does it encode the bytes in their hex format?
Out[9]: b'9b65a26cd67e445fb29ea3881734a701'

In [10]: u.bytes  # or does it encode the bytes directly?
Out[10]: b'\x9be\xa2l\xd6~D_\xb2\x9e\xa3\x88\x174\xa7\x01'

In [11]: u.bytes_le  # or perhaps using the little endian representation?
Out[11]: b'l\xa2e\x9b~\xd6_D\xb2\x9e\xa3\x88\x174\xa7\x01'

@Alviner
Copy link
Author

Alviner commented Jul 26, 2023

thanks for fast answer)

  1. date (this works with subtypes), decimal as string (this is file), set/frozenset as list (this also is file) and uuids (problem is here)
  2. encode and decode
  3. it use directly UUID.bytes

@jcrist
Copy link
Owner

jcrist commented Jul 26, 2023

Ok, I think we can fix this by:

  • adding support for decoding uuids from 16-byte binary inputs. This would be supported in msgspec.convert and msgspec.msgpack.decode only, since other protocols don't have a distinct binary type.
  • adding an option to the msgspec.msgpack.Encoder to optionally encode uuids as binary values instead. Something like msgspec.msgpack.Encoder(uuid_format="bytes"). We already have a similar pattern for decimals (e.g. decimal_format="number"), so there's some precedent here.

Given those changes the following would work:

import msgspec
import uuid

enc = msgspec.msgpack.Encoder(uuid_format="bytes")  # encode uuids as bytes

u = uuid.uuid4()
print(u)
#> f647ac40-3902-4cf8-b4f5-462a7e469fb0

msg = enc.encode(u)
print(msg)
#> b'\xc4\x10\xf6G\xac@9\x02L\xf8\xb4\xf5F*~F\x9f\xb0'

u2 = msgspec.msgpack.decode(msg, type=uuid.UUID)  # we might require passing `strict=False` here to decode from bytes, I'm undecided
assert u == u2

Would that resolve your issue?

@Alviner
Copy link
Author

Alviner commented Jul 26, 2023

Ok, I think we can fix this by:

  • adding support for decoding uuids from 16-byte binary inputs. This would be supported in msgspec.convert and msgspec.msgpack.decode only, since other protocols don't have a distinct binary type.
  • adding an option to the msgspec.msgpack.Encoder to optionally encode uuids as binary values instead. Something like msgspec.msgpack.Encoder(uuid_format="bytes"). We already have a similar pattern for decimals (e.g. decimal_format="number"), so there's some precedent here.

Given those changes the following would work:

import msgspec
import uuid

enc = msgspec.msgpack.Encoder(uuid_format="bytes")  # encode uuids as bytes

u = uuid.uuid4()
print(u)
#> f647ac40-3902-4cf8-b4f5-462a7e469fb0

msg = enc.encode(u)
print(msg)
#> b'\xc4\x10\xf6G\xac@9\x02L\xf8\xb4\xf5F*~F\x9f\xb0'

u2 = msgspec.msgpack.decode(msg, type=uuid.UUID)  # we might require passing `strict=False` here to decode from bytes, I'm undecided
assert u == u2

Would that resolve your issue?

yep, it looks like what we need. Awesome.

EDITED: damned, we use msgpack.extType with specific code( sry, it for dates only. All will work file, thank you

@jcrist jcrist changed the title Custom enc/dec hook for subtypes of standard types Support encoding UUIDs as bytes Jul 28, 2023
@jcrist
Copy link
Owner

jcrist commented Jul 28, 2023

This has been implemented in #499. If you have the time, I'd appreciate it if you could try it out before the next release to ensure it's working for you. See here for instructions for installing from GitHub.

@Alviner
Copy link
Author

Alviner commented Jul 28, 2023

This has been implemented in #499. If you have the time, I'd appreciate it if you could try it out before the next release to ensure it's working for you. See here for instructions for installing from GitHub.

have just checked. It's exactly what we need)
Undying gratitude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants