Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full ADT support with pyclass for complex enums #3582

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

mkovaxx
Copy link
Contributor

@mkovaxx mkovaxx commented Nov 17, 2023

Design Overview

A complex enum is an enum with any non-unit variants. For now, tuple-like variants are out of scope.

For each variant of a complex enum, PyO3 generates a hidden Rust zero-sized type (empty struct). The PyClass for each variant is a subclass of the PyClass for the enum itself. Whenever a value of the enum is passed to the Python side, it is cast to the PyClass of the specific variant. Each variant PyClass exposes getters for its fields.

The PyClass of the enum type itself has no constructor, so it cannot be instantiated from Python. Instead of a constructor, it has a class attribute corresponding to each variant, which returns the PyType of the matching variant newtype. This allows instantiating each variant newtype from Python.

Example & Details

To make this more concrete, here's an example.

#[pyclass]
pub enum ComplexEnum {
    Int { i: i32 },
    Float { f: f64 },
    Str { s: String },
}

Under the hood, PyO3 generates these hidden ZSTs:

struct ComplexEnum_Int;
struct ComplexEnum_Float;
struct ComplexEnum_Str;

Each of these has a constructor and getter methods for its fields.

E.g. ComplexEnum_Int will have the following generated impl:

impl ComplexEnum_Int {
    fn __pymethod_constructor__(
        py: _pyo3::Python<'_>,
        i: i32,
    ) -> _pyo3::PyClassInitializer<ComplexEnum_Int> {
        let base_value = ComplexEnum::Int { i };
        _pyo3::PyClassInitializer::from(base_value).add_subclass(ComplexEnum_Int)
    }

    fn i(slf: _pyo3::PyRef<Self>) -> _pyo3::PyResult<i32> {
        match &*slf.into_super() {
            ComplexEnum::Int { i, .. } => Ok(i.clone()),
            _ => unreachable!("Wrong complex enum variant found in variant wrapper PyClass"),
        }
    }
}

(Omitting the generated PyClass impls for brevity.)

This allows the following operations on the Python side:

# instantiation
value = ComplexEnum.Int(42)

# instance checks against variants: this will return True
isinstance(ComplexEnum.Float(3.14), ComplexEnum.Float)

# instance checks against variants: this will return False
isinstance(ComplexEnum.Float(3.14), ComplexEnum.Int)

# instance checks against the enum: this will return True
isinstance(ComplexEnum.Float(3.14), ComplexEnum)

# extracting fields
print(value.i)

# the wrapper newtype only has getters for fields actually present on the variant
# this will fail with AttributeError: 'builtins.ComplexEnum_Int' object has no attribute 'f'
print(value.f)
# this will fail with AttributeError: 'builtins.ComplexEnum_Int' object has no attribute 's'
print(value.s)

# pattern matching
match value:
    case ComplexEnum.Int(i=x):
        print(f"Int({x})")
    case ComplexEnum.Float(f=x):
        print(f"Float({x})")
    case ComplexEnum.Str(s=x):
        print(f"Str({x})")
    case _:
        print("Invalid")

Passing variant values between Python and Rust is also fully supported, e.g.:

#[pyfunction]
pub fn do_stuff(thing: &ComplexEnum) -> ComplexEnum {
    match thing {
        ComplexEnum::Int { i } => ComplexEnum::Str { s: format!("{i}") },
        ComplexEnum::Float { f } => ComplexEnum::Float { f: f * f },
        ComplexEnum::Str { s } => ComplexEnum::Int { i: s.len() as i32 },
    }
}

Such a function can be called from Python and works as expected.

Copy link

codspeed-hq bot commented Nov 17, 2023

CodSpeed Performance Report

Merging #3582 will improve performances by 22.03%

Comparing mkovaxx:pyclass_complex_enum (3ed5ddb) with main (43504cd)

Summary

⚡ 1 improvements
✅ 77 untouched benchmarks

Benchmarks breakdown

Benchmark main mkovaxx:pyclass_complex_enum Change
not_a_list_via_downcast 153.9 ns 126.1 ns +22.03%

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for working on this! 🙏

I haven't reviewed the code in detail yet, ping me whenever you want that done.

Can you summarise the intended functionality you want to have for the full ADTs, maybe by pushing some tests (in the pytests/ subdirectory may be easiest). It would be worth checking everyone understands the target design is before we go into too much detail on reviewing the implementation.

.python-version Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Member

Ah, I see your examples posted above 👍. Overall I think that looks sensible, and the ability to autogenerate constructors is consistent with what happens on the Rust side.

What are the next steps you have for this PR? Any questions you would like my input on?

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 18, 2023

@davidhewitt Thanks for taking a look! I've been using PyO3 since 2019 and am getting a ton of value out of it, so I'm really excited about the opportunity to contribute back! :)

TL;DR / TODO

Plan to get this PR ready for review:
[x] 1. Reduce scope to handling only struct-like (as opposed to tuple-like) variants for now, just to make my life easier. I think extension to tuple-like variants would be a relatively straightforward add-on in a follow-up PR.
[x] 2. Modify the generated impl IntoPy<PyObject> to wrap the native Rust enum into the matching variant PyClass.
[x] 3. Modify the generated impl FromPyObject to unwrap the native Rust enum from the variant PyClass.
[x] 4. Add pytests.

Any guidance on how best to approach TODOs 2 and 3 would be greatly appreciated! :)

Design Overview

Because I don't understand everything in PyO3, I try to reuse as much machinery as possible instead of rolling my own. My understanding is particularly spotty around PyClassPyO3Options, so any thoughts would be welcome on whether changes are needed in that area! :)

A complex enum is an enum with any non-unit variants. For now, tuple-like variants are out of scope.

For each variant of a complex enum, PyO3 generates a hidden Rust newtype. Each variant newtype wraps a value of the enum, and implements PyClass with an appropriate constructor and getter methods for each of its fields. A runtime invariant is that each variant newtype must contain a value of the matching variant.

The PyClass of the enum type itself has no constructor, so it cannot be instantiated from Python. Instead of a constructor, it has a class attribute corresponding to each variant, which returns the PyType of the matching variant newtype. This allows instantiating each variant newtype from Python.

Example & Details

To make this more concrete, here's an example.

#[pyclass]
pub enum ComplexEnum {
    Int { i: i32 },
    Float { f: f64 },
    Str { s: String },
}

Under the hood, PyO3 generates these hidden variant newtypes:

struct ComplexEnum_Int(ComplexEnum);
struct ComplexEnum_Float(ComplexEnum);
struct ComplexEnum_Str(ComplexEnum);

Each of these has a constructor and getter methods for its fields.

E.g. ComplexEnum_Int will have the following generated impl:

impl ComplexEnum_Int {
    fn __pymethod_constructor__(i: i32) -> Self {
        Self(ComplexEnum::Int { i })
    }
    fn i(&self) -> _pyo3::PyResult<i32> {
        match &self.0 {
            ComplexEnum::Int { i, .. } => Ok(i.clone()),
            _ => unreachable!("Wrong complex enum variant found in wrapper newtype"),
        }
    }
}

(Omitting the generated PyClass impls for brevity.)

This allows the following operations on the Python side:

# instantiation
value = ComplexEnum.Int(42)

# extracting fields
print(value.i)

# the wrapper newtype only has getters for fields actually present on the variant
# this will fail with AttributeError: 'builtins.ComplexEnum_Int' object has no attribute 'f'
print(value.f)
# this will fail with AttributeError: 'builtins.ComplexEnum_Int' object has no attribute 's'
print(value.s)

# pattern matching
match value:
    case ComplexEnum.Int(i=x):
        print(f"Int({x})")
    case ComplexEnum.Float(f=x):
        print(f"Float({x})")
    case ComplexEnum.Str(s=x):
        print(f"Str({x})")
    case _:
        print("Invalid")

One peculiarity, however, is that isinstance(ComplexEnum.Float(3.14), ComplexEnum) returns False. As far as I understand, this would be difficult to fix because of how the Python C API works. It also doesn't seem to be a huge usability problem.

It should also be possible to seamlessly pass values back-and-forth between Rust and Python. (This isn't done yet, see TODOs 2 and 3.)

In particular, something like this should work:

#[pyfunction]
pub fn do_stuff(thing: &ComplexEnum) -> ComplexEnum {
    match thing {
        ComplexEnum::Int { i } => ComplexEnum::Str { s: format!("{i}") },
        ComplexEnum::Float { f } => ComplexEnum::Float { f: f * f },
        ComplexEnum::Str { s } => ComplexEnum::Int { i: s.len() as i32 }
    }
}

As far as I understand, this requires impl FromPyObject for the variant wrapper newtypes, and impl IntoPy<PyObject> for the enum itself.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, tuple-like variants are out of scope.

Seems reasonable, I'm happy to have that solved in a follow-up PR, agreed.

One peculiarity, however, is that isinstance(ComplexEnum.Float(3.14), ComplexEnum) returns False. As far as I understand, this would be difficult to fix because of how the Python C API works. It also doesn't seem to be a huge usability problem.

For 3., things probably just work if we get this to work. I think having this return True is also desirable for users. I think this will also be important for #[pymethods] impl ComplexEnum to work properly on the variant subclasses.

I've added some suggestions to the code which might solve this, however this requires changing the variants from being newtypes to being zero-sized markers to build Python subclasses.

[ ] 2. Modify the generated impl IntoPy<PyObject> to wrap the native Rust enum into the matching variant PyClass.

For 2., I think a working approach with the zero-sized variant types would be to have the generated IntoPy implementation look like this:

(See impl_into_py in pyo3-macros-backend/src/pyclass.rs)

                impl _pyo3::IntoPy<_pyo3::PyObject> for #cls {
                    fn into_py(self, py: _pyo3::Python) -> _pyo3::PyObject {
                        let variant = match &self {
                            // deduce the newtype variant here
                        }
                        _pyo3::IntoPy::into_py(_pyo3::Py::new(py, PyClassInitializer::from(self).add_subclass(variant)).unwrap(), py)
                    }
                }

The downside of this is that IntoPy for the enum will diverge from Py::new, which I think is a bit of a footgun. I suggest we focus on getting the rest of the functionality agreed upon first, and then later can work on a refactoring to get these aligned again.

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Member

Just to add again: thank you for working on this, so many users will be joyful to see this land! I think this is a difficult blend of implementation and design, so this will probably have several rounds of review and maybe some painful challenges. With patience we can make a lot of people happy 🚀

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 19, 2023

@davidhewitt Thanks for adding a ton of suggestions! I understand that getting this PR landed might take a while, and I'm prepared to help make the necessary changes. 🚀

Using ZSTs for the variant PyClasses seems like an ingenious approach and not something I'd have thought of on my own! 😄 I'll spend some time today to see where that takes us.

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 20, 2023

@davidhewitt I need your help again for more high-level guidance, because I feel lost in the traits of PyO3. 😅

Here's my understanding of the error messages I'm getting.

For each variant PyClass, already existing machinery generates wrapper methods for both the constructor and the field getters, i.e. __pymethod___new____ and __pymethod_get_#field__, along with corresponding INTRINSIC_ITEMS on the PyClassImpl.

For example, for ComplexEnum_Int's field i, we get the following getter generated:

unsafe fn __pymethod_get_i__(
    py: _pyo3::Python<'_>,
    _slf: *mut _pyo3::ffi::PyObject,
) -> _pyo3::PyResult<*mut _pyo3::ffi::PyObject> {
    let arg = _pyo3::impl_::extract_argument::extract_pyclass_ref::<ComplexEnum_Int>(
        py.from_borrowed_ptr::<_pyo3::PyAny>(_slf),
        &mut { _pyo3::impl_::extract_argument::FunctionArgumentHolder::INIT },
    )?;
    _pyo3::callback::convert(py, ComplexEnum_Int::i(arg))
}

I've slightly modified the code here, creating the arg binding so we can talk about it. The problem here seems to be that the type of arg comes out to be &ComplexEnum_Int, but the Rust-side ComplexEnum_Int::i takes a PyRef, because it needs to get its super:

fn i(slf: _pyo3::PyRef<Self>) -> _pyo3::PyResult<i32> {
    match &*slf.into_super() {
        ComplexEnum::Int { i, .. } => Ok(i.clone()),
        _ => unreachable!("Wrong complex enum variant found in variant wrapper PyClass"),
    }
}

So it seems like inside __pymethod_get_i__ the &ComplexEnum_Int needs to be converted into a PyRef<ComplexEnum_Int>, but I'm not sure how to do that. Is that even what is supposed to happen here?

There's another problem inside the constructor wrapper, e.g. for ComplexEnum_Int:

unsafe fn __pymethod___new____(
    py: _pyo3::Python<'_>,
    subtype: *mut _pyo3::ffi::PyTypeObject,
    _args: *mut _pyo3::ffi::PyObject,
    _kwargs: *mut _pyo3::ffi::PyObject,
) -> _pyo3::PyResult<*mut _pyo3::ffi::PyObject> {
    use _pyo3::callback::IntoPyCallbackOutput;
    let function = ComplexEnum_Int::__pymethod_constructor__;
    const DESCRIPTION: _pyo3::impl_::extract_argument::FunctionDescription =
        _pyo3::impl_::extract_argument::FunctionDescription {
            cls_name: ::std::option::Option::Some(
                <ComplexEnum_Int as _pyo3::type_object::PyTypeInfo>::NAME,
            ),
            func_name: "__new__",
            positional_parameter_names: &["i"],
            positional_only_parameters: 0usize,
            required_positional_parameters: 1usize,
            keyword_only_parameters: &[],
        };
    let mut output = [::std::option::Option::None; 1usize];
    let (_args, _kwargs) =
            DESCRIPTION.extract_arguments_tuple_dict::<_pyo3::impl_::extract_argument::NoVarargs,
                    _pyo3::impl_::extract_argument::NoVarkeywords>(py, _args,
                    _kwargs, &mut output)?;
    let result = ComplexEnum_Int::__pymethod_constructor__(
        py,
        _pyo3::impl_::extract_argument::extract_argument(
            _pyo3::impl_::extract_argument::unwrap_required_argument(output[0usize]),
            &mut { _pyo3::impl_::extract_argument::FunctionArgumentHolder::INIT },
            "i",
        )?,
    );
    let initializer: _pyo3::PyClassInitializer<ComplexEnum_Int> = result.convert(py)?;
    let cell = initializer.create_cell_from_subtype(py, subtype)?;
    ::std::result::Result::Ok(cell as *mut _pyo3::ffi::PyObject)
}

Here result comes out of ComplexEnum_Int::__pymethod_constructor__, which looks like this:

fn __pymethod_constructor__(py: _pyo3::Python<'_>, i: i32) -> _pyo3::PyObject {
    use _pyo3::IntoPy;
    ComplexEnum::Int { i }.into_py(py)
}

So result is a PyObject aka Py<PyAny>, and the surrounding code attempts to convert that into a PyClassInitializer<ComplexEnum_Int>, which causes an error about trait impls missing.

I can easily change the code generated for ComplexEnum_Int::__pymethod_constructor__ and ComplexEnum_Int::i, so if there's a way to solve these problems doing that, that would be the easiest.

But I realize that generating some trait impls might also solve these issues without changing anything else, so I figured I'd ask for your insight here.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions which should hopefully point you in the right direction.

Just a note - all of the macros code is fairly organically grown to suit the functionality we needed; a lot of it is not beautiful. If you feel a need to refactor any of it, that's welcome. (Although maybe best to isolate refactoring into separate PRs, for the sake of review.)

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 22, 2023

@davidhewitt I've added more pytests. The remaining failures are due to the variant PyClasses being cast into the enum PyClass when returned from a Rust function. I think your comment #3582 (review) describes the fix for this, so I'll try that.

@mkovaxx mkovaxx force-pushed the pyclass_complex_enum branch from 51ffb07 to 9b03800 Compare November 23, 2023 01:11
@mkovaxx mkovaxx marked this pull request as ready for review November 23, 2023 01:41
@mkovaxx mkovaxx requested a review from davidhewitt November 23, 2023 01:41
@davidhewitt
Copy link
Member

Exciting to see a green CI run! I'm dealing with a sick family at the moment so I may struggle to review this in depth for a few days, please bear with me.

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 23, 2023

Exciting to see a green CI run! I'm dealing with a sick family at the moment so I may struggle to review this in depth for a few days, please bear with me.

@davidhewitt Thanks for the heads-up, and no rush! I'll do some cleanup and other chores in the meantime. I hope your family gets well soon! :)

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the implementation in detail but I have played around with your branch and I'd like to point out the following footgun:

x = enums.ComplexEnum.Int(42)

match x:
    case enums.ComplexEnum.Int(n = 42):
        pass
    case _:
        raise AssertionError    

which raises the AssertionError.

Perhaps it is easier to only support single tuple enums like

#[pyclass]
pub enum Value {
    Int(i32),
    Str(String),
}

for now, instead of enums with named fields?

@davidhewitt
Copy link
Member

If I read right, the problem in that example is the wrong name for the attribute was used, n = 42 when i = 42 would have been correct.

Isn't that just an inevitable weakness of python match statements though? I'm unsure we can ever do better than just documenting this sort of looseness of the Python type system.

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 24, 2023

@mejrs Thanks for taking the time to try it out!

As David says, this is just how Python pattern matching syntax works, unfortunately.

Even if we had written that entirely in Python, something like this:

from dataclasses import dataclass

@dataclass
class IntVariant:
    i: int

It would have the same behavior.

x = IntVariant(42)

match x:
    case IntVariant(n=42):
        print(f"Field n is 42")
    case IntVariant(i=42):
        print(f"Field i is 42")
    case _:
        raise AssertionError

I.e. this prints "Field i is 42".

The goal here is to match the behavior of Python, so I don't think this is a problem.

@davidhewitt
Copy link
Member

davidhewitt commented Nov 24, 2023

Hmm, here's another problem case though, which makes subclassing a little more awkward.

Imagine we have a method which mutates the enum to change the variant, like this:

#[pymethods]
impl ComplexEnum {
    fn set_to_string(&mut self, s: String) {
        *self = ComplexEnum::Str { s }
    }
}

By changing the variant, we completely break our current type assumptions:

>>> from pyo3_pytests.enums import ComplexEnum
>>> i = ComplexEnum.Int(32)
>>> isinstance(i, ComplexEnum.Int)
True
>>> i.set_to_string("foo")
>>> isinstance(i, ComplexEnum.Int)
True
>>> isinstance(i, ComplexEnum.Str)
False
>>> i.i
thread '<unnamed>' panicked at pytests/src/enums.rs:36:1:
internal error: entered unreachable code: Wrong complex enum variant found in variant wrapper PyClass
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: internal error: entered unreachable code: Wrong complex enum variant found in variant wrapper PyClass

I'm not entirely sure what the fix is here. A simple solution would be to require all enums to be #[pyclass(frozen)]. That might be attractive for other reasons related to nogil anyway. It might even be backwards-compatible to remove that constraint later.


But I wonder, could we change the design to support mutating the variant? I suppose that the subclasses don't work as they are in that case. What happens if we make the subclasses have a custom __instancecheck__?

e.g. something like this pure-python code:

from typing import Any


class FooVariantMeta(type):
    def __instancecheck__(self, instance: Any) -> bool:
        if not isinstance(instance, Foo):
            return False

        match self:
            case Foo.Int:
                return isinstance(instance.value, int)
            case Foo.Str:
                return isinstance(instance.value, str)


class Foo:
    value: int | str

    def __init__(self, value) -> None:
        self.value = value

    class Int(metaclass=FooVariantMeta):
        def __new__(cls, value: int) -> "Foo":
            return Foo(value)

    class Str(metaclass=FooVariantMeta):
        def __new__(cls, value: str) -> "Foo":
            return Foo(value)


i = Foo.Int(1)

print(isinstance(i, Foo))  # True
print(isinstance(i, Foo.Int))  # True
print(isinstance(i, Foo.Str))  # False

match i:
    case Foo.Int(value=x):
        print("int", x)  # 1
    case Foo.Str(value=x):
        print("str", x)  # Never reached

NB metaclasses aren't something we've attempted to support yet in PyO3, I think they only work with abi3 on 3.12, so this would limit the version support I think.

... opinions welcome on what seems like the better route to go down.

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 24, 2023

@davidhewitt That's indeed an issue I haven't considered, thanks for flagging!

Here are the different perspectives that I can think of here.

(1) Python Semantics

I think the idea of an in-place operation that changes the type/class of a value is rather startling from a Python perspective. If we go with the design choice that the variants correspond to different types/classes in Python, then the function in your example does exactly that. Because of that, the ability to create such functions and call them from Python seems like a footgun. On the other hand, to enable Python pattern matching for the variants, they need to behave like different types/classes, either by virtue of being or by a custom __instancecheck__.

(2) Customizing __instancecheck__

A quick search brings up an open issue and a discussion that suggest it might currently be infeasible.

(3) Rust in Practice

The need for a &mut self method on an enum feels rather contrived to me. Alternatives seem to exist for all cases I can think of.

First the example your mentioned:

#[pymethods]
impl ComplexEnum {
    fn set_to_string(&mut self, s: String) {
        *self = ComplexEnum::Str { s }
    }
}

I imagine this would be used from Python like so:

x: ComplexEnum = ComplexEnum.Int(42)
x.set_to_string("hello")

Because the result doesn't depend on self, the method could be replaced with a function like this:

#[pymethods]
impl ComplexEnum {
    #[staticmethod]
    fn from_string(s: String) -> Self {
        ComplexEnum::Str { s }
    }
}

And used like this from Python:

x: ComplexEnum = ComplexEnum.Int(42)
x = ComplexEnum.from_string("hello")

I've also tried to imagine cases where the result does depend on self, like this:

#[pymethods]
impl ComplexEnum {
    fn do_stuff_in_place(&mut self) {
        *self = match self {
            ComplexEnum::Int { i } => ComplexEnum::Str { s: i.to_string() },
            ComplexEnum::Float { f } => ComplexEnum::Float { f: f * f },
            ComplexEnum::Str { s } => ComplexEnum::Int { i: s.len() as i32 },
        }
    }
}

Again, used like this from Python:

x: ComplexEnum = ComplexEnum.Int(42)
x.do_stuff_in_place()

It seems like it is always a viable alternative to construct and return a new value instead of in-place mutation:

#[pymethods]
impl ComplexEnum {
    fn do_stuff(&self) -> Self {
        match self {
            ComplexEnum::Int { i } => ComplexEnum::Str { s: i.to_string() },
            ComplexEnum::Float { f } => ComplexEnum::Float { f: f * f },
            ComplexEnum::Str { s } => ComplexEnum::Int { i: s.len() as i32 },
        }
    }
}

Then use it from Python like this:

x: ComplexEnum = ComplexEnum.Int(42)
x = x.do_stuff()

In a nutshell, replacing x.do_stuff_in_place() with x = x.do_stuff() works both in Python and Rust. The Haskeller part of my brain insists that it is a more readable style as well. ;)

Based on the considerations above, I'm in favor of requiring enums to be #[pyclass(frozen)] for now.

Do you see any glaring downsides with this approach?

@davidhewitt
Copy link
Member

davidhewitt commented Nov 25, 2023

1 - I somewhat agree; while the Python snippet I wrote in #3582 (comment) would change variant "type" when assigning to Foo.value, I do think it's a bit strange.

2 - I think we could make __instancecheck__ work, it would just take a lot of effort. There are a couple appealing things about making the variants not actually be real subclasses:

  • We can then mark the enum type as final (e.g. not subclassable) - at the moment we force #[pyclass(subclass)]. But if someone creates another subclass of ComplexEnum (e.g. in pure Python) then it will never match any of the variants. The __instancecheck__ variants wouldn't require ComplexEnum to be subclassable, and I think even if someone did subclass it the __instancecheck__ tests could probably be compatible with a subclass.
  • In Rust the enum variants aren't real types (at least not until an RFC comes along to change that), I think these variants that are only really useful for pattern matching are potentially a closer reproduction of that semantic.

3 - I am mostly ok with requiring the enums to be #[pyclass(frozen)], but here's a couple of caveats:

  • I think we want to understand how we can lift this restriction in the future backwards-compatibly. E.g. I think if we had to change to the __instancecheck__ design to lift the restriction, then it's potentially not backwards-compatible.
  • I mostly agree that constructing a new value rather than mutating in-place is a better design pattern, but there might be performance drawbacks to this (creating Python objects is relatively expensive, plus it might force copying of other data too).

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 26, 2023

@davidhewitt Thanks for taking the time to think about this issue and write your thoughts down!

To ensure that we will be able to backwards-compatibly move from the current "base class" approach to the "final" approach, I tried to identify the key design constraints that must be satisfied regardless of the approach.

The following key design constraints derive from the semantics of Python:

  1. Pattern matching in Python desugars to a sequence of if-elif branches with isinstance checks. Therefore, each Rust enum variant must have a corresponding PyType (which may or may not be a PyClass), so that it may be used in patterns. E.g. for ComplexEnum::Int on the Rust side, there must be a corresponding PyType, so that a case ComplexEnum.Int pattern may be written.
  2. It must be possible to construct a value of a variant using the corresponding PyType. E.g. to construct a ComplexEnum::Int, whatever the ComplexEnum.Int PyType may be, it must have either __init__ or __new__.
  3. Given a value of a variant, isinstance must return True with (1) the PyType of the variant and (2) the PyType of the enum, and must return False with any other PyType.

Please try to think if there are more key constraints we should include here!

The main approaches we've identified so far, and how they each satisfy the constraints above, are as follows.

The "Base Class" Approach

The approach this PR currently explores.

The PyType of the enum is a base class, and the PyType of each variant is a subclass of it. Each actual value of the enum is an instance of a variant's PyClass.

Constraint (1) is trivially satisfied.
Constraint (2) is trivially satisfied.
Constraint (3) is satisfied because of subclassing.

To protect runtime invariants, we need to disallow in-place mutation of enum values, the enum's PyClass is made frozen.

The "Final" Approach

The PyType of the enum is a final (i.e. non-subclassable) PyClass, and the PyType of each variant is unrelated. Each value of the enum is an instance of the enum's PyClass; the variant PyTypes are never instantiated.

Constraint (1) is trivially satisfied.
Constraint (2) is satisfied by each variant PyType implementing __new__, returning a value of the enum's PyClass. Constraint (3) is satisfied by setting the metaclass of the enum's PyClass with a custom __instancecheck__.

This approach would allow for in-place mutation of enum values. Because we have no examples of using the metaclass machinery in PyO3, more time and effort will be needed to work out the details of this approach.

Backwards Compatibility

It seems reasonable to start with the base class approach, and eventually move to the final approach, while ensuring that the constraints above are satisfied.

The main concern is that the base class approach would make it possible for a user to subclass the enum's PyClass, which would break backwards compatibility. I agree that ideally we'd have a mechanism to prevent that. I think it would be a reasonable start to simply warn users in the documentation against doing that.

@davidhewitt
Copy link
Member

@mkovaxx thanks for writing that up. Here's some further thoughts:

possible for a user to subclass the enum's PyClass

I don't think it's a problem when switching from the "base class" approach to the "final" approach, because it's possible for users to add #[pyclass(subclass)] to continue to allow this.


For backwards compatibility, I'm more concerned about what properties change between the two approaches which we cannot mitigate.

I can only think of one at the moment, which is that with the "base class" approach then type(my_enum) will be one of the variants, e.g. ComplexEnum.Int, but with the "final" approach then type(my_enum) will always just be ComplexEnum.

Is this a problem? In a corner case somewhere, maybe someone depends on this property. If we think this (or any other similar properties matter) then it means we have to be more careful about changing the approach later.

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Nov 29, 2023

@davidhewitt That's another corner case indeed, good point!

The general wisdom in Python circles is to rely on isinstance instead of type. I can't imagine a use case where type must be used over isinstance, apart from observing private internals of the enum representation. In fact, heavy use of isinstance is itself discouraged, but pattern matches desugar to it so that's a given of the design here.

To bring back an earlier concern, I think the larger picture that emerges here is that making &mut self methods accessible from Python is likely not worth the trouble. There are two separate aspects here: strange surface behavior that breaks Python semantics, and increased complexity not only under the hood, but also leaking to the surface.

First, the semantics-breaking aspect.

The Python documentation says:

An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object’s type (which is an object itself). Like its identity, an object’s type is also unchangeable. 1

Then the fine print attached says:

It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly.

(Emphasis mine.)

My interpretation is that having a value change its type due to an operation is strange behavior, and the documentation advises against it. I think if I were to write a Python library where that happens, community feedback on it would be "please don't do that". Exposing &mut self methods to Python enables creating such behavior, so I think we should not allow it.

Second, added complexity that also leaks to the surface.

Subclassing satisfies the key design constraints without any additional machinery.

The approaches without subclassing, on the other hand, need additional machinery:

  1. Custom __instancecheck__ to make isinstance and pattern matching work. This causes unintuitive behavior in relation to type, which cannot be customized. (This is the problem you identified above.)
  2. Custom __new__ on each variant, so that instances may be constructed. Again, this causes unintuitive behavior, because the variants themselves are never actually instantiated; every value is of the enum type.

This seems like a lot of complexity and induced strange behavior to alleviate what AFAIU is a concern about performance. I think we should not put a performance concern above general usability.

Let's get the data!

I think we don't really understand how much of a need there is to expose &mut self methods. So it seems reasonable to go with the approach that disallows them, and wait for issues to be filed. In other words, let's collect the data!

@mkovaxx mkovaxx force-pushed the pyclass_complex_enum branch from 348459a to eb00987 Compare November 29, 2023 07:51
Comment on lines +771 to +812
// Needs to be frozen to disallow `&mut self` methods, which could break a runtime invariant
rigged_args.options.frozen = parse_quote!(frozen);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidhewitt

I've put this in to see if it works.
It does, but the error message might be a little difficult for a user to understand.

error[E0271]: type mismatch resolving `<ComplexEnum as PyClass>::Frozen == False`
  --> lib.rs:48:30
   |
48 |     pub fn do_stuff_in_place(&mut self) {
   |                              ^ expected `False`, found `True`
   |
note: required by a bound in `extract_pyclass_ref_mut`
  --> /home/mate/prog/pyo3/src/impl_/extract_argument.rs:57:56
   |
57 | pub fn extract_pyclass_ref_mut<'a, 'py: 'a, T: PyClass<Frozen = False>>(
   |                                                        ^^^^^^^^^^^^^^ required by this bound in `extract_pyclass_ref_mut`

Do you see a way to improve this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is totally fine; it's the same as what we already have for the rest of frozen pyclass UX (regardless of enums). It's purely a documentation problem at this point I think, outside the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test case using test_compile_error to document that &mut self methods are not supported.

@davidhewitt
Copy link
Member

Sorry for the delay in review, I got a bit swamped by a few things going on in code and irl. Overall I'm thinking it makes sense to move ahead with this design, I'll try to give a full in depth review by the end of the weekend.

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Dec 8, 2023

@davidhewitt Hey David, no worries, thanks for coming back to this PR! :)

Please take your time with the review!

pytests/tests/test_enums.py Outdated Show resolved Hide resolved
pytests/tests/test_enums.py Outdated Show resolved Hide resolved
@mkovaxx mkovaxx force-pushed the pyclass_complex_enum branch 5 times, most recently from 00fffa1 to 96d099f Compare January 18, 2024 03:58
@davidhewitt davidhewitt added the CI-no-fail-fast If one job fails, allow the rest to keep testing label Jan 18, 2024
@mkovaxx mkovaxx force-pushed the pyclass_complex_enum branch from 96d099f to 3ed5ddb Compare January 18, 2024 13:05
Comment on lines +1135 to +1157
let def_count_vertices = if py.version_info() >= (3, 10) { r#"
def count_vertices(cls, shape):
match shape:
case cls.Circle():
return 0
case cls.Rectangle():
return 4
case cls.RegularPolygon(side_count=n):
return n
case cls.Nothing():
return 0
"# } else { r#"
def count_vertices(cls, shape):
if isinstance(shape, cls.Circle):
return 0
elif isinstance(shape, cls.Rectangle):
return 4
elif isinstance(shape, cls.RegularPolygon):
n = shape.side_count
return n
elif isinstance(shape, cls.Nothing):
return 0
"# };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the version issue, but it makes the example ugly...
A more readable but less correct alternative would be to simply skip this test on Python < 3.10.

Copy link
Member

@davidhewitt davidhewitt Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could #![cfg(Py_3_10)] at the top of the example, out of sight of users even.

Given I've just set to merge, let's clean this up in a follow up PR.

Comment on lines +18 to +20
if sys.version_info < (3, 10):
# Match syntax is only available in Python >= 3.10
ignored_paths.append("tests/test_enums_match.py")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the version-checking magic happens to only run match-based tests on Python >= 3.10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great solution, nicely done

@mkovaxx
Copy link
Contributor Author

mkovaxx commented Jan 18, 2024

@davidhewitt Some tests now seem to fail on a download not coming through, and something else I don't understand.

Maybe this is just network flakiness?

Caused by:
    0: failed to make network request
    1: error sending request for url (https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256): error trying to connect: connection closed via error
    2: error trying to connect: connection closed via error
    3: connection closed via error
Error: Process completed with exit code 1.

I'm clueless about this one...

maturin failed
      Caused by: Cargo didn't build a cdylib. Did you miss crate-type = ["cdylib"] in the lib section of your Cargo.toml?
    Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/home/runner/work/pyo3/pyo3/pytests/.nox/test/bin/python', '--compatibility', 'off', '--profile=dev'] returned non-zero exit status 1
    error: subprocess-exited-with-error

@davidhewitt
Copy link
Member

Rust nightly had some odd issues yesterday, see PyO3/setuptools-rust#398

I'll rerun the failures. The rest does sound like network failures, yes.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the full build looks good, this should succeed to merge this time...

@davidhewitt davidhewitt enabled auto-merge January 19, 2024 14:09
@davidhewitt davidhewitt added this pull request to the merge queue Jan 19, 2024
Merged via the queue into PyO3:main with commit d1b0722 Jan 19, 2024
67 checks passed
@mkovaxx
Copy link
Contributor Author

mkovaxx commented Jan 20, 2024

@davidhewitt Thank you for landing this! :)

I've added these follow-up issues:
#3747
#3748
#3749
#3750

amorenoz added a commit to amorenoz/retis that referenced this pull request Aug 30, 2024
OVS event is built incrementally when more than one bpf event is
received. For that reason, it implements Default and each event adds a
bit of information to it.

However, the way Default is implemented is by having a unit enum
value that reprsents "undefined" value. Adding unit values in an
otherwise complex is not supported by pyo3 [1].

This patch uses Option<> instead of an undefined value.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Sep 18, 2024
OVS event is built incrementally when more than one bpf event is
received. For that reason, it implements Default and each event adds a
bit of information to it.

However, the way Default is implemented is by having a unit enum
value that reprsents "undefined" value. Adding unit values in an
otherwise complex is not supported by pyo3 [1].

This patch uses Option<> instead of an undefined value.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Sep 19, 2024
OVS event is built incrementally when more than one bpf event is
received. For that reason, it implements Default and each event adds a
bit of information to it.

However, the way Default is implemented is by having a unit enum
value that represents "undefined" value. Adding unit values in an
otherwise complex is not supported by pyo3 [1].

This patch uses Option<> instead of an undefined value.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Oct 4, 2024
OVS module currently makes no assumptions on the order of the event
chunks and tries to build the event even if out-of-order pieces are
received. This makes little sense in practice as these chuncks are sent
in the same hook.

Removing that unneeded requirement and assuming the base action event
(OVS_DP_ACTION) will be received before specific action argument events
(e.g: OVS_DP_ACTION_OUTPUT) makes decoding simpler.

This also it avoids requiring a Default version of the event which is
currently implemented using an "undefined" value for enums. This
mechanism is not supported by pyo3.

Also, create a dummy object to avoid having mixed complex unit variants
in enums.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Oct 4, 2024
OVS module currently makes no assumptions on the order of the event
chunks and tries to build the event even if out-of-order pieces are
received. This makes little sense in practice as these chuncks are sent
in the same hook.

Removing that unneeded requirement and assuming the base action event
(OVS_DP_ACTION) will be received before specific action argument events
(e.g: OVS_DP_ACTION_OUTPUT) makes decoding simpler.

This also it avoids requiring a Default version of the event which is
currently implemented using an "undefined" value for enums. This
mechanism is not supported by pyo3.

Also, create a dummy object to avoid having mixed complex unit variants
in enums.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Oct 4, 2024
OVS module currently makes no assumptions on the order of the event
chunks and tries to build the event even if out-of-order pieces are
received. This makes little sense in practice as these chuncks are sent
in the same hook.

Removing that unneeded requirement and assuming the base action event
(OVS_DP_ACTION) will be received before specific action argument events
(e.g: OVS_DP_ACTION_OUTPUT) makes decoding simpler.

This also it avoids requiring a Default version of the event which is
currently implemented using an "undefined" value for enums. This
mechanism is not supported by pyo3.

Also, create a dummy object to avoid having mixed complex unit variants
in enums.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Oct 16, 2024
OVS module currently makes no assumptions on the order of the event
chunks and tries to build the event even if out-of-order pieces are
received. This makes little sense in practice as these chuncks are sent
in the same hook.

Removing that unneeded requirement and assuming the base action event
(OVS_DP_ACTION) will be received before specific action argument events
(e.g: OVS_DP_ACTION_OUTPUT) makes decoding simpler.

This also it avoids requiring a Default version of the event which is
currently implemented using an "undefined" value for enums. This
mechanism is not supported by pyo3.

Also, create a dummy object to avoid having mixed complex unit variants
in enums.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Oct 17, 2024
OVS module currently makes no assumptions on the order of the event
chunks and tries to build the event even if out-of-order pieces are
received. This makes little sense in practice as these chuncks are sent
in the same hook.

Removing that unneeded requirement and assuming the base action event
(OVS_DP_ACTION) will be received before specific action argument events
(e.g: OVS_DP_ACTION_OUTPUT) makes decoding simpler.

This also it avoids requiring a Default version of the event which is
currently implemented using an "undefined" value for enums. This
mechanism is not supported by pyo3.

Also, create a dummy object to avoid having mixed complex unit variants
in enums.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
amorenoz added a commit to amorenoz/retis that referenced this pull request Oct 21, 2024
OVS module currently makes no assumptions on the order of the event
chunks and tries to build the event even if out-of-order pieces are
received. This makes little sense in practice as these chuncks are sent
in the same hook.

Removing that unneeded requirement and assuming the base action event
(OVS_DP_ACTION) will be received before specific action argument events
(e.g: OVS_DP_ACTION_OUTPUT) makes decoding simpler.

This also it avoids requiring a Default version of the event which is
currently implemented using an "undefined" value for enums. This
mechanism is not supported by pyo3.

Also, create a dummy object to avoid having mixed complex unit variants
in enums.

[1] Upstream discussions:
PyO3/pyo3#3582 (comment)
PyO3/pyo3#3749

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-build-full CI-no-fail-fast If one job fails, allow the rest to keep testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants