Skip to content

Commit

Permalink
Merge branch 'release-0.2.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
brandjon committed Dec 21, 2014
2 parents 22d4bd7 + b005101 commit 1268780
Show file tree
Hide file tree
Showing 15 changed files with 766 additions and 256 deletions.
20 changes: 20 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Release notes

## 0.2.1 (2014-12-20)

- changed type checking keyword argument names: `opt` -> `or_none`
and `nodups` -> `unique`
- improved error messages for constructing Structs
- significant updates to readme and examples
- using `opt=True` on `TypedField` no longer implies that `None` is
the default value
- made mixin version of `checktype()` and `checktype_seq()`
- added `check()` and `normalize()` hooks to `TypedField`
- accessing fields descriptors from classes is now permissible
- added support for default values in general, and optional values
for type-checked fields
- fixed `__repr__()` on recursive Structs

## 0.2.0 (2014-12-15)

- initial release
185 changes: 134 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,17 @@

*(Supports Python 3.3 and up)*

This is a small utility for making it easier to create "struct" classes
in Python without writing boilerplate code. Structs are similar to the
standard library's `collections.namedtuple` but are more flexible,
relying on an inheritance-based approach instead of `eval()`ing a code
template.
This small library makes it easier to create "struct" classes in Python
without writing boilerplate code. Structs are similar to the standard
library's [`collections.namedtuple`][1] but are more flexible, relying on an
inheritance-based approach instead of `eval()`ing a code template. If
you like using `namedtuple` classes but wish they were more composable
and extensible, this project is for you.

## Example

Writing struct classes by hand is tedious and error prone. Consider a
simple Point2D class. The bare minimum we can write is
simple point class. The bare minimum we can write in Python is

```python
class Point2D:
Expand All @@ -20,85 +21,167 @@ class Point2D:
self.y = y
```

but for it to be of any use, we'll need structural equality semantics
and perhaps some pretty printing for debugging.
We'll likely want to compare points for equality and pretty-print them
for debugging.

```python
class Point2D:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
print('Point2D({}, {})'.format(self.x, self.y))
__str__ = __repr__
# Separate __str__() would be nice too
return 'Point2D({!r}, {!r})'.format(self.x, self.y)
def __eq__(self, other):
# Nevermind type-checking and subtyping.
# Should check other's type too
return self.x == other.x and self.y == other.y
def __hash__(self):
# Required because we're overriding __eq__().
return hash(self.x) ^ hash(self.y)
```

If you're the sort of heathen who likes to use dynamic type checks
in Python code, you'll want to add extra argument checking to the
constructor. And we'll probably want to disallow inadvertently
reassigning to x and y after construction, or else the hash value
could become inconsistent -- a big problem if the point is stored
in a hash-based collection.
Already the code is becoming pretty verbose for such a simple concept.
Worse, it violates the [DRY principle](http://en.wikipedia.org/wiki/Don%27t_repeat_yourself)
in that the `x` and `y` fields appear many times. This isn't very
robust. If we decide to turn this into a `Point3D` class, we'll have
to upgrade each method to accommodate a new z coordinate. We could be
in for an infuriating bug if we forget to update `__eq__()` or
`__hash__()`. Adding more utilities like a copy/replace method will
exacerbate the situation.

Even if we do all that, the code isn't robust to change. If we decide
to make this a Point3D class, we'll have to update each method to
accommodate the new z coordinate. One oversight and we're in for a
potentially hard-to-find bug.
Then there's the added code for consistency checking. Maybe you're the
sort of heathen who prefers dynamic type checking over blindly trusting
Mama Ducktype. Or maybe you want to disallow overwriting `x` and `y` so
as to avoid changing its hash value. Either way you'd need to use
descriptors or properties to intercept writes.

`namedtuple` takes care of many of these problems, but it's not
extensible. You can't easily derive a new class from a namedtuple
class without implementing much of this boilerplate. It also forces
immutability, which may be inappropriate for your use case.
SimpleStruct provides a simple alternative. Here is a `Point2D` class
that provides everything described above.

SimpleStruct provides a simple alternative. For the above case,
we just write
```python
from numbers import Number # standard library abstract base class
from simplestruct import Struct, Field, TypedField

class Point2D(Struct):
# Note that field declaration order matters.
x = TypedField(Number)
y = TypedField(Number)
```

Of course, customizations are possible. Type checking is by no means
required, objects may be mutable so long as they are not hashed,
and you can add your own non-Field attributes and properties.

from simplestruct import Struct, Field
```python
class Point2D(Struct):
_immutable = False
x = Field
y = Field

class Point2D(Struct):
x = Field
y = Field
# magnitude won't be considered when hashing or testing equality
@property
def magnitude(self):
return (self.x**2 + self.y**2) ** .5
```

For more usage examples, see the sample files:

File | Purpose
---|---
[point.py](examples/point.py) | introduction, basic use
[typed.py](examples/typed.py) | type-checked fields
[vector.py](examples/vector.py) | advanced features
[abstract.py](examples/abstract.py) | mixing structs and metaclasses

## Comparison and feature matrix

The most important problems mentioned above are solved by using
`namedtuple`, but this approach begins to break down when you
start to customize classes. To add a property to a `namedtuple`,
you must define a subclass:

```python
BasePerson = namedtuple('BasePerson', 'fname lname age')
class Person(BasePerson):
@property
def full_name(self):
return self.fname + ' ' + self.lname
```

## Feature matrix
If on the other hand you want to extend an existing `namedtuple` with
new fields, it's a bit harder. You need to regenerate (not inherit)
the boilerplate methods so they recognize the new fields. This can be
done using multiple inheritance:

```python
BaseEmployee = namedtuple('BaseEmployee', Employee._fields + ('salary',))
class Employee(BaseEmployee, Person):
pass
```

Implementation wise, `namedtuple` works by dynamically evaluating
a templated class definition based on the built-in `tuple` type.
This gives it a speed advantage, but is also the main reason why
it is less extensible (and unable to handle mutable values).

In contrast, SimpleStruct is based on metaclasses, descriptors, and
dynamic dispatch. The below matrix summarizes the feature comparison.

Feature | Avoids boilerplate for | Supported by `namedtuple`?
---|:---:|:---:
construction | `__init__()` | ✓
extra attributes on self | |
easy construction | `__init__()` | ✓
extra attributes on self | | subclasses only
pretty printing | `__str()__`, `__repr()__` | ✓
structural equality | `__eq__()` | ✓
inheritance | | ✗
easy inheritance | | ✗
optional mutability | | ✗
hashing (if immutable) | `__hash__()` | ✓
pickling / deep-copying | | ✓
tuple decomposition | `__len__`, `__iter__` | ✓
optional type checking | | ✗
optional type checking | `__init__()`, `@property` | ✗
`_asdict()` / `_replace()` | | ✓

[MacroPy][2]'s case classes provide similar functionality, but is
implemented in a very different way. Instead of metaclass hacking
or source code templating, it relies on syntactic transformation
of the module's AST. This allows for a syntax that's very different
from what we've seen above. So different, in fact, that we might view
MacroPy as an extension to the Python language rather than as just
a library. MacroPy case classes are subject to limitations on
inheritance and class members.

## Installation ##

The `_asdict()` and `_replace()` methods from `namedtuple` are also
provided.
As with most Python packages, SimpleStruct is available on PyPI:

One advantage that `namedtuple` does have is speed. It is based on
the built-in Python tuple type, whereas SimpleStruct has the added
overhead of descriptor function calls.
```
python -m pip install simplestruct
```

Or grab a development version if you're so inclined:

```
python -m pip install /~https://github.com/brandjon/simplestruct/tree/tarball/develop
```

Python 3.3 and 3.4 are supported. There are no additional dependencies.

## To use ###
## Developers ##

See the `examples/` directory.
Tests can be run with `python setup.py test`, or alternatively by
installing [Tox](http://testrun.org/tox/latest/) and running
`python -m tox` in the project root. Tox has the advantage of automatically
testing under both Python 3.3 and 3.4. Building a source distribution
(`python setup.py sdist`) requires the setuptools extension package
[setuptools-git](/~https://github.com/wichert/setuptools-git).

## References ##

## TODO ###
[1]: https://docs.python.org/3/library/collections.html#collections.namedtuple
[[1]] The standard library's `namedtuple` feature

Features TODO:
- add support for `__slots__`
- make exceptions appear to be raised from the stack frame of user code
where the type error occurred, rather than inside this library (with
a flag to disable, for debugging)
[2]: /~https://github.com/lihaoyi/macropy#case-classes
[[2]] Li Haoyi's case classes (part of MacroPy)

Packaging TODO:
- fix up setup.py, make installable
[3]: http://harts.net/reece/2013/06/02/using-namedtuples-with-method-and-instance-variable-inheritance/
[[3]] Reece Hart's blog post on inheriting from `namedtuple`
5 changes: 5 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Wishlist #
- add support for `__slots__`
- make exceptions appear to be raised from the stack frame of user code
where the type error occurred, rather than inside this library (with
a flag to disable, for debugging)
33 changes: 17 additions & 16 deletions examples/abstract.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,26 @@
from abc import ABCMeta, abstractmethod
from simplestruct import Struct, Field, MetaStruct


# A simple ABC. Subclasses must provide an override for foo().
class Abstract(metaclass=ABCMeta):
@abstractmethod
def foo(self):
pass

# If we ran this code
#
# class Concrete(Abstract, Struct):
# f = Field
# def foo(self):
# return self.f ** 2
#
# we would get the following error:
#
# TypeError: metaclass conflict: the metaclass of a derived class
# must be a (non-strict) subclass of the metaclasses of all its bases
#
# So let's make a trivial subclass of ABCMeta and MetaStruct.
# ABCs rely on a metaclass that conflicts with Struct's metaclass.
try:
class Concrete(Abstract, Struct):
f = Field
def foo(self):
return self.f ** 2

except TypeError as e:
print(e)
# metaclass conflict: the metaclass of a derived class must
# be a (non-strict) subclass of the metaclasses of all its bases

# So let's make a trivial subclass of ABCMeta and MetaStruct.
class ABCMetaStruct(MetaStruct, ABCMeta):
pass

Expand All @@ -33,13 +34,13 @@ def foo(self):
c = Concrete(5)
print(c.foo()) # 25

# For convenience we can also do

# For convenience we can make a version of Struct that
# incorporates the common metaclass.
class ABCStruct(Struct, metaclass=ABCMetaStruct):
pass

# and then

# Now we only have to do:
class Concrete(Abstract, ABCStruct):
f = Field
def foo(self):
Expand Down
Loading

0 comments on commit 1268780

Please sign in to comment.