Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove @dataclass, add slots #144

Closed
wants to merge 88 commits into from

Conversation

Gobot1234
Copy link
Collaborator

Adds a custom metaclass for Message to quickly create dataclasses for Messages whilst also slotting them. Internally should still be basically the same as a dataclass.

Benefits:

  • This will almost certainly be faster than the current approach, I will need to run speed tests on it before I am 100% sure.
  • Removes an import from all generated messages, and improves visual clarity.

This is incorporates 2 breaking changes:

  • Removing @dataclass from all generated outputs, since this is for V2 however I think it is fine, there are a couple of options if you want to give warning like monkey patching the dataclass decorator to do nothing for Message subclasses.
  • Fields cannot now be access outside of instances of themselves, pretty minor change, this shouldn't really affect anyone as they (I think) would always be betterproto.Placeholder, but if necessary it could be remedied by implementing MessageMeta.__getattribute__.

Closes #50

@abn
Copy link
Collaborator

abn commented Aug 27, 2020

I am not sure if optimising for performance over usability by default is a good thing. While, I can understand the motivation for this change, I feel that dropping dataclasses means you are loosing a lot of niceties that it brings to the table. I am also unclear on how much of a gain we are talking here. We might also want to talk about making it optional, ie. dataclasses vs slots at compile time.

As an additional note, the alternative is to move from dataclasses to something that already fascilitates slots. The attrs project is a good option for that. Obviously, everything has trade-offs. :)

@Gobot1234
Copy link
Collaborator Author

It isn't dropping anything internally, it just removes the decorator.

@abn
Copy link
Collaborator

abn commented Aug 27, 2020

It isn't dropping anything internally, it just removes the decorator.

Ah, I might have spoke too soon then. Will go through the changes. Ignore my reactionary comment. :)

@Gobot1234
Copy link
Collaborator Author

Gobot1234 commented Aug 27, 2020

Some tests

Results

Tabled results are for a Message of:

class Message(betterproto.Message):
    foo: int = betterproto.uint32_field(0)
    bar: int = betterproto.uint32_field(1)
    baz: int = betterproto.uint32_field(2)

unless specified otherwise.

Important results:

Test Old implementation New slotted implementation
Size of 576 bytes 256 bytes
Runtime Creation 741 usec 38.4 usec
Attribute Access 71.4 nsec 43.8 nsec
Instantiation 10 usec 11 usec

Size of Instance

Tested with 0, 3, 5, 10, 50 field large message instances. Using pympler.asizesof due to sys.getsizeof not being recursive.

Size 0 3 5 10 50
Old 152 576 728 1096 3848
New 208 256 272 312 632

Slots make messages significantly more linear in the amount of memory usage they end up using

Runtime overhead

Old raw times: 74.1 sec, 90.5 sec, 74.1 sec, 82 sec, 87.8 sec

100000 loops, best of 5: 741 usec per loop
Slotted raw times: 4.96 sec, 4.72 sec, 4.17 sec, 3.84 sec, 4.1 sec

100000 loops, best of 5: 38.4 usec per loop

This is ridiculously faster with MessageMeta at around 20x

Attribute Access

Old raw times: 13.2 msec, 8.81 msec, 7.48 msec, 7.28 msec, 7.14 msec

100000 loops, best of 5: 71.4 nsec per loop
Slotted raw times: 4.38 msec, 6.24 msec, 5.9 msec, 4.46 msec, 5.67 msec

100000 loops, best of 5: 43.8 nsec per loop

So it's about 40% faster with slots

Instantiation

Old raw times: 2.34 sec, 1.03 sec, 1 sec, 1.61 sec, 2.55 sec

100000 loops, best of 5: 10 usec per loop
Slotted raw times: 1.25 sec, 1.1 sec, 1.14 sec, 1.12 sec, 1.2 sec

100000 loops, best of 5: 11 usec per loop

Currently instantiation with the current implementation is faster, however only marginally and I think the instantiation can be improved on the new model, or if not we can just use the dataclasses implementation, if it doesn't hurt runtime overhead particularly.

@Gobot1234
Copy link
Collaborator Author

I've optimized init more and it's now faster (by a couple of usec) than the dataclasses approach.

@Gobot1234
Copy link
Collaborator Author

Tests again will fail until #130 is merged

src/betterproto/__init__.py Outdated Show resolved Hide resolved
@Gobot1234
Copy link
Collaborator Author

Gobot1234 commented Nov 7, 2020

poe bench currently returns

All benchmarks:

       before           after         ratio
     [f10bec47]       [222a105b]
     <master>         <dataslots>
      1.99±0.09μs      1.93±0.2μs      0.96 benchmarks.BenchMessage.time_attribute_access
      10.5±0.4μs       9.01±0.2μs      0.86 benchmarks.BenchMessage.time_attribute_setting
      17.3±1μs         13.1±0.2μs      0.76 benchmarks.BenchMessage.time_init_with_values
      17.4±0.5μs       7.96±0.6μs      0.46 benchmarks.BenchMessage.time_instantiation
      722±30μs          293±9μs        0.41 benchmarks.BenchMessage.time_overhead
      22.5±2μs          22.2±1μs       0.99 benchmarks.BenchMessage.time_serialize

@Gobot1234
Copy link
Collaborator Author

Not too sure why the getattr call isn't much of an improvement as it should be the same just __slot__ boosted?

@Gobot1234
Copy link
Collaborator Author

I'm going to close this for now. I'm sure this can be done better but it requires a lot of work.

@Gobot1234 Gobot1234 closed this Jan 3, 2021
@Gobot1234 Gobot1234 deleted the dataslots branch January 10, 2021 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request on hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize dataclasses with __slots__
6 participants