Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jiff support #243

Merged
merged 28 commits into from
Oct 5, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
09e583b
Simplify usage for transmute
chmp Oct 1, 2024
6d09746
Add jiff as a dev-dep
chmp Oct 1, 2024
8c8bf3f
Test various reprs between Chrono and Jiff
chmp Oct 1, 2024
7c936f6
reformat
chmp Oct 1, 2024
7afd9ff
Test more datetime formats
chmp Oct 1, 2024
05df305
Add support for jiff::Time
chmp Oct 1, 2024
669a85c
Implement jiff::Date for positive years
chmp Oct 1, 2024
cc0016b
Test transmutation chrono::DateTime<Utc> <-> jiff::Timestamp
chmp Oct 1, 2024
8c446ec
Implement support dates with negative years across chrono and jiff
chmp Oct 2, 2024
b81f10b
regroup chrono tests
chmp Oct 2, 2024
bed5835
Regroup jiff tests
chmp Oct 2, 2024
57faa15
Address clippy
chmp Oct 2, 2024
3f7c300
Add support for jiff::DateTime
chmp Oct 2, 2024
9d7d8fd
Add support jiff::Timestamp
chmp Oct 3, 2024
24c9825
Move chrono parsers to top-level and start to return data from parsers
chmp Oct 3, 2024
3ef1634
Modify all parsers to return the matched data
chmp Oct 3, 2024
e78c01a
Implement span parsing
chmp Oct 3, 2024
0796cc7
Initial impl of Span support
chmp Oct 3, 2024
8ea3927
Implement negative duration support
chmp Oct 3, 2024
24dba78
Implement subsecond support
chmp Oct 3, 2024
e2dbaf5
Increase robustness
chmp Oct 3, 2024
d2b849a
Add SignedDuration support
chmp Oct 3, 2024
b301895
Update the changelog and the overview
chmp Oct 4, 2024
a4817ef
Update tests
chmp Oct 4, 2024
587ecd5
Reorg status page to allow deep linking
chmp Oct 4, 2024
e37e3b0
Fix clippy
chmp Oct 4, 2024
c3a470a
Small code improvements
chmp Oct 4, 2024
c5698bf
Remove outdated todo
chmp Oct 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ New features
to `Time64(Nanosecond))`) in `from_samples`
- Improved error messages for non self describing types (`chrono::*`, `uuid::Uuid`,
`std::net::IpAddr`)
- Add support for various `jiff` types (`jiff::Date`, `jiff::Time`, `jiff::DateTime`,
`jiff::Timestamp`, `jiff::Span`, `jiff::SignedDuration`)

## 0.12.0

Expand Down
1 change: 1 addition & 0 deletions serde_arrow/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ serde_bytes = "0.11"
rand = "0.8"
bigdecimal = {version = "0.4", features = ["serde"] }
uuid = { version = "1.10.0", features = ["serde", "v4"] }
jiff = { version = "0.1", features = ["serde"] }

# for benchmarks
# arrow-version:replace: arrow-json-{version} = {{ package = "arrow-json", version = "{version}" }}
Expand Down
175 changes: 125 additions & 50 deletions serde_arrow/Status.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# Status

Supported arrow data types:
The page documents the supported types both from an Arrow and a Rust perspective.

- [Arrow data types](#arrow-data-types)
- [Rust types](#rust-types)
- [Native / standard types](#native--standard-types)
- [`chrono` types](#chrono-types)
- [`jiff` types](#jiff-types)
- [`rust_decimal` and `bigdecimal` types](#rust_decimal-and-bigdecimal-types)

## Arrow data types

- [x] [`Null`](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Null)
- [x] [`Boolean`](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Boolean)
Expand Down Expand Up @@ -49,7 +58,9 @@ Supported arrow data types:
serialization error.
- [ ] [`Decimal256(precision, scale)`](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Decimal256)

Native / standard Rust types:
## Rust types

### Native / standard types

- [x] `bool`
- [x] `i8`, `i16`, `i32`, `i64`
Expand All @@ -72,54 +83,118 @@ Native / standard Rust types:
supported
- [x] `struct S(T)`: newtype structs are supported, if `T` is supported

Non-standard Rust types

- [x] `chrono::DateTime<Utc>`:
- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` with strategy `UtcStrAsDate64`
- `from_samples` detects the type `LargeUtf8` without configuration, the type `Date64` with
strategy `UtcStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [x] `chrono::DateTime<Utc>` using [`chrono::serde::ts_microseconds`][chrono-ts-microseconds]:
- is serialized / deserialized as `i64`
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` without Strategy,
`Date64` with strategy `UtcStrAsDate64`
- `from_samples` and `from_type` detect the type `Int64`
- [x] `chrono::NaiveDateTime`:
- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., None)`, `Date64` with strategy `NaiveStrAsDate64`
- `from_samples` detects the type `LargeUtf8` without configuration, the type `Date64` with
strategy `NaiveStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [x] `chrono::NaiveTime`:
- serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Time32(..)` and `Time64` arrays
- `from_samples` detects the type `LargeUtf8` without configuration, the type `Time64(Nanosecond)`
when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [x] `chrono::NaiveDate`:
- is serialized as Serde strings
- can be mapped to `Utf8`, `LargeUtf8`, `Date32` arrays
- `from_samples` detects the type `LargeUtf8` without configuration, to `Date32` when setting
`guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [ ] `chrono::Duration`: does not support Serde and is therefore not supported
- [x] [`rust_decimal::Decimal`][rust_decimal::Decimal] for the `float` and `str`
(de)serialization options when using the `Decimal128(..)` data type
- [x] [`bigdecimal::BigDecimal`][bigdecimal::BigDecimal] when using the
`Decimal128(..)` data type


[crate::base::Event]: https://docs.rs/serde_arrow/latest/serde_arrow/event/enum.Event.html
[crate::to_record_batch]: https://docs.rs/serde_arrow/latest/serde_arrow/fn.to_record_batch.html
[crate::trace_schema]: https://docs.rs/serde_arrow/latest/serde_arrow/fn.trace_schema.html
[serde::Serialize]: https://docs.serde.rs/serde/trait.Serialize.html
[serde::Deserialize]: https://docs.serde.rs/serde/trait.Deserialize.html
[crate::Schema::from_records]: https://docs.rs/serde_arrow/latest/serde_arrow/struct.Schema.html#method.from_records
[chrono]: https://docs.rs/chrono/latest/chrono/

[crate::base::EventSource]: https://docs.rs/serde_arrow
[crate::base::EventSink]: https://docs.rs/serde_arrow
### `chrono` types

#### `chrono::DateTime<Utc>`

- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` with strategy `UtcStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `UtcStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

With [`chrono::serde::ts_microseconds`][chrono-ts-microseconds]:

- is serialized / deserialized as `i64`
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` without Strategy,
`Date64` with strategy `UtcStrAsDate64`
- `from_samples` and `from_type` detect `Int64`

#### `chrono::NaiveDateTime`

- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., None)`, `Date64` with strategy `NaiveStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `NaiveStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `chrono::NaiveTime`

- serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Time32(..)` and `Time64` arrays
- `from_samples` detects
- `LargeUtf8` without configuration
- `Time64(Nanosecond)` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `chrono::NaiveDate`

- is serialized as Serde strings
- can be mapped to `Utf8`, `LargeUtf8`, `Date32` arrays
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date32` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

`chrono::Duration` does not support Serde and is therefore not supported

### `jiff` types

#### `jiff::Date`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Date32`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date32` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::Time`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Time32(..)`, `Time64(..)`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Time64(Nanosecond)` when setitng `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::DateTime`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Timestmap(.., None)`, `Date64` with strategy
`NaiveStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `NaiveStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::Timestamp`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("UTC"))`, `Date64` with strategy
`UtcStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `UtcStrDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::Span`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Duration(..)`
- `from_samples` detects `LargeUtf8`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::SignedDuration`

Same as `jiff::Span`

#### `jiff::Zoned`

is not supported as there is no clear way of implementation

### `rust_decimal` and `bigdecimal` types

### [`rust_decimal::Decimal`][rust_decimal::Decimal]

- for the `float` and `str` (de)serialization options when using the `Decimal128(..)` data type

### [`bigdecimal::BigDecimal`][bigdecimal::BigDecimal]

- when using the `Decimal128(..)` data type

[chrono-ts-microseconds]: https://docs.rs/chrono/latest/chrono/serde/ts_microseconds/
[rust_decimal::Decimal]: https://docs.rs/rust_decimal/latest/rust_decimal/struct.Decimal.html
[bigdecimal::BigDecimal]: https://docs.rs/bigdecimal/0.4.2/bigdecimal/struct.BigDecimal.html
4 changes: 2 additions & 2 deletions serde_arrow/src/_impl/docs/defs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ pub fn example_arrow_arrays() -> (Vec<crate::_impl::arrow::datatypes::FieldRef>,
let items = example_records();

let fields = Vec::<crate::_impl::arrow::datatypes::FieldRef>::from_type::<Record>(TracingOptions::default()).unwrap();
let arrays = crate::to_arrow(&fields, &items).unwrap();
let arrays = crate::to_arrow(&fields, items).unwrap();

(fields, arrays)
}
Expand All @@ -40,7 +40,7 @@ pub fn example_arrow2_arrays() -> (Vec<crate::_impl::arrow2::datatypes::Field>,
let items = example_records();

let fields = Vec::<crate::_impl::arrow2::datatypes::Field>::from_type::<Record>(TracingOptions::default()).unwrap();
let arrays = crate::to_arrow2(&fields, &items).unwrap();
let arrays = crate::to_arrow2(&fields, items).unwrap();

(fields, arrays)
}
2 changes: 1 addition & 1 deletion serde_arrow/src/arrow2_impl/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ impl crate::internal::array_builder::ArrayBuilder {
/// Construct `arrow2` arrays and reset the builder (*requires one of the
/// `arrow2-*` features*)
pub fn to_arrow2(&mut self) -> Result<Vec<Box<dyn Array>>> {
self.to_arrays()?
self.build_arrays()?
.into_iter()
.map(Box::<dyn Array>::try_from)
.collect()
Expand Down
2 changes: 1 addition & 1 deletion serde_arrow/src/arrow_impl/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ impl crate::internal::array_builder::ArrayBuilder {
/// Construct `arrow` arrays and reset the builder (*requires one of the
/// `arrow-*` features*)
pub fn to_arrow(&mut self) -> Result<Vec<ArrayRef>> {
self.to_arrays()?
self.build_arrays()?
.into_iter()
.map(ArrayRef::try_from)
.collect()
Expand Down
2 changes: 1 addition & 1 deletion serde_arrow/src/internal/array_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ impl ArrayBuilder {
self.builder.extend(items)
}

pub(crate) fn to_arrays(&mut self) -> Result<Vec<Array>> {
pub(crate) fn build_arrays(&mut self) -> Result<Vec<Array>> {
let mut arrays = Vec::new();
for field in self.builder.take_records()? {
arrays.push(field.into_array()?);
Expand Down
Loading