Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pyo3-0.23.2 #119

Open
piotryordanov opened this issue Nov 28, 2024 · 2 comments
Open

Support pyo3-0.23.2 #119

piotryordanov opened this issue Nov 28, 2024 · 2 comments

Comments

@piotryordanov
Copy link

Any plan on supporting the latest stable version of pyo3-0.23.2 ?

@bionicles
Copy link

bionicles commented Dec 8, 2024

#112 seems related
#37 is a PR but could be out of date given the pace of pyo3

i'd suggest pyo3 likely moves faster than polars because pyo3 has a bigger community; therefore it could be good if the pyo3 CI would automatically bump the version

Unfortunately, bumping the version leads to compile errors because pyo3-polars::types::PyDataType (around line 357) implements ToPyObject (deprecated) instead of IntoPyObject, which has a different main method name and signature which returns a Result instead of being infallible

Therefore, we need somebody with deep understanding of the pyo3 changes and pyo3-polars internals to refactor the impl IntoPyObject for PyDataType and various other deprecated parts of the pyo3 API inside the types module ... looks pretty complicated

I hit a blocker here:

mismatched types
std::result::Result<pyo3::Bound<'py, pyo3::Py<pyo3::PyAny>>, _>`
   found enum `std::result::Result<pyo3::Bound<'_, pyo3::PyAny>, _>

TLDR: besides needing to return Result<Self::Output, Self::Error> instead of straight up returning a "PyAny" -- we need to return a Py and the Py generic is actually something we're ditching earlier in the stack or panicking

All these functions are using this:

    fn getattr<N>(&self, attr_name: N) -> PyResult<Bound<'py, PyAny>>
    where
        N: IntoPyObject<'py, Target = PyString>,
    {
        fn inner<'py>(
            any: &Bound<'py, PyAny>,
            attr_name: Borrowed<'_, '_, PyString>,
        ) -> PyResult<Bound<'py, PyAny>> {
            unsafe {
                ffi::PyObject_GetAttr(any.as_ptr(), attr_name.as_ptr())
                    .assume_owned_or_err(any.py())
            }
        }

        inner(
            self,
            attr_name
                .into_pyobject(self.py())
                .map_err(Into::into)?
                .as_borrowed(),
        )
    }

ah, after some clutch help from llama 3.3 (thanks zuck!), i discovered the Target isn't really used, and I wasn't able to update the Output to match what we had, until I changed "Target" to match the inside of the Bound in Output

impl<'py> IntoPyObject<'py> for PyDataType {
    type Target = PyAny; // I don't know what "Target" is
    type Output = Bound<'py, PyAny>;
    type Error = pyo3::PyErr;

    fn into_pyobject(self, py: Python) -> Result<Self::Output, Self::Error> {
        let pl: &Bound<'py, PyAny> = POLARS.bind(py);

        match &self.0 {
            DataType::Int8 => {
                pl.getattr(intern!(py, "Int8"))
            }

and we can replace a bunch of "unwrap" with "?", but it seems likely we can get this to compile now

@bionicles
Copy link

bionicles commented Dec 8, 2024

ok, got it to compile, here's how:

  1. updated all the pyo3 versions in all the crates in the repo to 23
  2. fixed the compile error in pyo3-polars::src::types

There are still a LOT of deprecations but this fixes the compile error and it did build
image

// pyo3-polars::types line ~357 
impl<'py> IntoPyObject<'py> for PyDataType {
    type Target = PyAny; // I don't know what "Target" is
    type Output = Bound<'py, PyAny>;
    type Error = pyo3::PyErr;

    fn into_pyobject(self, py: Python<'py>) -> Result<Self::Output, Self::Error> {
        let pl: &Bound<'py, PyAny> = POLARS.bind(py);

        match &self.0 {
            DataType::Int8 => {
                let class = pl.getattr(intern!(py, "Int8"))?;
                class.call0() // need to mass translate these unwraps into Err
            }
            DataType::Int16 => {
                let class = pl.getattr(intern!(py, "Int16"))?;
                class.call0()
            }
            DataType::Int32 => {
                let class = pl.getattr(intern!(py, "Int32"))?;
                class.call0()
            }
            DataType::Int64 => {
                let class = pl.getattr(intern!(py, "Int64"))?;
                class.call0()
            }
            DataType::UInt8 => {
                let class = pl.getattr(intern!(py, "UInt8"))?;
                class.call0()
            }
            DataType::UInt16 => {
                let class = pl.getattr(intern!(py, "UInt16"))?;
                class.call0()
            }
            DataType::UInt32 => {
                let class = pl.getattr(intern!(py, "UInt32"))?;
                class.call0()
            }
            DataType::UInt64 => {
                let class = pl.getattr(intern!(py, "UInt64"))?;
                class.call0()
            }
            DataType::Float32 => {
                let class = pl.getattr(intern!(py, "Float32"))?;
                class.call0()
            }
            DataType::Float64 | DataType::Unknown(UnknownKind::Float) => {
                let class = pl.getattr(intern!(py, "Float64"))?;
                class.call0()
            }
            #[cfg(feature = "dtype-decimal")]
            DataType::Decimal(precision, scale) => {
                let class = pl.getattr(intern!(py, "Decimal"))?;
                let args = (*precision, *scale);
                class.call1(args)
            }
            DataType::Boolean => {
                let class = pl.getattr(intern!(py, "Boolean"))?;
                class.call0()
            }
            DataType::String | DataType::Unknown(UnknownKind::Str) => {
                let class = pl.getattr(intern!(py, "String"))?;
                class.call0()
            }
            DataType::Binary => {
                let class = pl.getattr(intern!(py, "Binary"))?;
                class.call0()
            }
            #[cfg(feature = "dtype-array")]
            DataType::Array(inner, size) => {
                let class = pl.getattr(intern!(py, "Array"))?;
                let inner = PyDataType(*inner.clone()).to_object(py);
                let args = (inner, *size);
                class.call1(args)
            }
            DataType::List(inner) => {
                let class = pl.getattr(intern!(py, "List")).unwrap();
                let inner = PyDataType(*inner.clone()).into_pyobject(py)?;
                class.call1((inner,))
            }
            DataType::Date => {
                let class = pl.getattr(intern!(py, "Date"))?;
                class.call0()
            }
            DataType::Datetime(tu, tz) => {
                let datetime_class = pl.getattr(intern!(py, "Datetime"))?;
                datetime_class.call1((tu.to_ascii(), tz.as_ref().map(|s| s.as_str())))
            }
            DataType::Duration(tu) => {
                let duration_class = pl.getattr(intern!(py, "Duration"))?;
                duration_class.call1((tu.to_ascii(),))
            }
            #[cfg(feature = "object")]
            DataType::Object(_, _) => {
                let class = pl.getattr(intern!(py, "Object"))?;
                class.call0()
            }
            #[cfg(feature = "dtype-categorical")]
            DataType::Categorical(_, ordering) => {
                let class = pl.getattr(intern!(py, "Categorical"))?;
                let ordering = match ordering {
                    CategoricalOrdering::Physical => "physical",
                    CategoricalOrdering::Lexical => "lexical",
                };
                class.call1((ordering,))
            }
            #[cfg(feature = "dtype-categorical")]
            DataType::Enum(rev_map, _) => {
                // we should always have an initialized rev_map coming from rust
                let categories = rev_map.as_ref()?.get_categories();
                let class = pl.getattr(intern!(py, "Enum"))?;
                let s = Series::from_arrow("category".into(), categories.clone().boxed())?;
                let series = to_series(py, PySeries(s));
                return class.call1((series,));
            }
            DataType::Time => pl.getattr(intern!(py, "Time")),
            #[cfg(feature = "dtype-struct")]
            DataType::Struct(fields) => {
                let field_class = pl.getattr(intern!(py, "Field"))?;
                let iter = fields.iter().map(|fld| {
                    let name = fld.name().as_str();
                    let dtype = PyDataType(fld.dtype().clone()).to_object(py);
                    field_class.call1((name, dtype))?
                });
                let fields = PyList::new_bound(py, iter);
                let struct_class = pl.getattr(intern!(py, "Struct"))?;
                struct_class.call1((fields,))
            }
            DataType::Null => {
                match pl.getattr(intern!(py, "Null")) {
                    Ok(obj) => Ok(obj),
                    Err(e) => Err(e),
                }
                // class.call0()
            }
            DataType::Unknown(UnknownKind::Int(v)) => {
                let x = materialize_dyn_int(*v);
                let x_dtype = x.dtype();
                let y: Result<Bound<'py, PyAny>, PyErr> = PyDataType(x_dtype).into_pyobject(py);
                y
            }
            DataType::Unknown(_) => {
                let class = pl.getattr(intern!(py, "Unknown"))?;
                class.call0()
            }
            DataType::BinaryOffset => {
                panic!("this type isn't exposed to python")
            }
            #[allow(unreachable_patterns)]
            _ => panic!("activate dtype"),
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants