-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Update extension type examples to not use UUID #43809
Comments
I'm a long time fan of the arrow project and I was hoping to get a bit more involved in the project. I figured this might be a good first issue. Please let me know if it is not or the following is a bad idea. Instead of something like a YAML or XML type, would a "rational" type make sense? Something like: import pyarrow as pa
import pyarrow.types as pt
class RationalType(pa.ExtensionType):
"""
A rational number represented as a struct of an integer `numer` (the numerator)
and an integer `denom` (the denominator)
"""
def __init__(self, data_type: pa.DataType):
if not pt.is_integer(data_type):
raise TypeError(f"data_type must be an integer type not {data_type}")
super().__init__(
pa.struct(
[
("numer", data_type),
("denom", data_type),
],
),
# N.B. This name does _not_ reference `data_type` so deserialization
# will work for _any_ integer `data_type` after registration
"my_package.rational",
)
def __arrow_ext_serialize__(self) -> bytes:
# No serialized metadata necessary
return b""
@classmethod
def __arrow_ext_deserialize__(self, storage_type, serialized):
# return an instance of this subclass given the serialized
# metadata
return RationalType(storage_type[0].type) This shows off a few more of the parameters that are passed around than the current UUID example. |
Rational seems like a good example! Complex was discussed in the past too, but it will probably be proposed as a canonical type candidate (@sjperkins?). So if we're sure rational won't be a canonical type I think rational is a good candidate. It also feels like an easier type to give pedagogical examples on then YAML/XML. On the other hand some one could nicely show how string kernels work on string storage. We don't really need to pick one - we can mix it up. |
Thanks for the ping @rok -- I really should re-propose a Complex number. I'm now thinking along the lines of ComplexFloat = FixedSizeBinary(64) and ComplexDouble = FixedSizeBinary(128), rather than the original FixedSizeListArray(float32(), 2) and FixedSizeListArray(float64(), 2) approach. I think the former will work better with FixedShapeTensor and VariableShapeTensor. I'm currently focused in other areas at the moment, but would like to revisit Complex numbers at some point. |
Oh interesting approach. Is there other systems that do this? Would this approach be better fitted for vectorization?
Feel free to ping me when you do! |
Will do!
…On Mon, Aug 26, 2024 at 9:23 AM Ian Cook ***@***.***> wrote:
@khwilson <https://github.com/khwilson> please tag me and @rok
<https://github.com/rok> to review when you have PR open. Thanks!
—
Reply to this email directly, view it on GitHub
<#43809 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALU5ETHGLVPIQA2CF33WTLZTNJBDAVCNFSM6AAAAABNA6GESCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGU4TENBUGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
### Rationale for this change UUID extension types were made canonical in #41299 and are getting native support in C++ and Python in #37298. As such, it makes sense to provide an alternative user-defined extension type as an example that is unlikely to become a canonical extension type anytime soon. After discussion in #43809, we determined a `RationalType` would make sense. Please note that this is a redo of #43849 as I made a blunder and accidentally pushed a branch that was in a wonky state. ### What changes are included in this PR? A change in several doc locations which reference a `UuidType` extension type have been changed to a `RationalType`. For consistency, this PR also changes single quotes (`''`) to double quotes (`""`) throughout the Python examples that it modifies. Also, seemingly unrelated to this change, some doctests began failing as numpy changed the `repr` of `float16`'s between 1.x and 2.x. We have updated the failing doctest so that it supports both styles. ### Are these changes tested? These are documentation changes and `archery docker run conda-python-docs` succeeds locally. ### Are there any user-facing changes? No. cc @ianmcook @rok * GitHub Issue: #43809 --------- Co-authored-by: Ian Cook <[email protected]>
Resolved by #44120 |
Describe the bug, including details regarding any error messages, version, and platform.
In the Format docs and Python docs, there are several examples of user-defined extension types and sample code showing how to implement them (by subclassing). These all use a UUID extension type as the example:
Now that UUID is a canonical extension type (#41299) and will have native support in C++ and Python (#37298), we should replace these with examples based on some other user-defined extension type—ideally one that is not likely to become a canonical extension type anytime soon. Maybe an XML or YAML extension type (with UTF8 storage type)?
Component(s)
Documentation
The text was updated successfully, but these errors were encountered: