Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checks and merge with master #3

Merged
merged 18 commits into from
Aug 5, 2024

Conversation

etseidl
Copy link

@etseidl etseidl commented Aug 1, 2024

Includes changes from #2, but also merges in master branch.

alamb and others added 18 commits July 26, 2024 06:11
* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (apache#6041)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`

Signed-off-by: Bugen Zhao <[email protected]>

* fix example tests

Signed-off-by: Bugen Zhao <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>

* Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (apache#6043)

* deprecate auto copy, ask explicit reference

* update comments

* make cargo doc happy

* Make display of interval types more pretty (apache#6006)

* improve dispaly for interval.

* update test in pretty, and fix display problem.

* tmp

* fix tests in arrow-cast.

* fix tests in pretty.

* fix style.

* Update snafu (apache#5930)

* Update Parquet thrift generated structures (apache#6045)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (apache#5933)

This reverts commit 22e0b44.

* Revert "Update snafu (apache#5930)" (apache#6069)

This reverts commit 756b1fb.

* Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (apache#6075)

* Update pyo3 requirement from 0.21.1 to 0.22.1

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* refactor: remove deprecated `FromPyArrow::from_pyarrow`

"GIL Refs" are being phased out.

* chore: update `pyo3` in integration tests

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove repeated codes to make the codes more concise. (apache#6080)

* Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (apache#6068)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* add support for unencoded_byte_array_data_bytes

* add comments

* change sig of ColumnMetrics::update_variable_length_bytes()

* rename ParquetOffsetIndex to OffsetSizeIndex

* rename some functions

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add Default trait to ColumnMetrics as suggested in review

* rename OffsetSizeIndex to OffsetIndexMetaData

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Update pyo3 requirement from 0.21.1 to 0.22.2 (apache#6085)

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.2)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (apache#6095)

* deprecate read_page_locations

* add to_thrift() to OffsetIndexMetaData

* Update parquet/src/column/writer/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bugen Zhao <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Jesse <[email protected]>
Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…aData` (apache#6105)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (apache#6041)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`

Signed-off-by: Bugen Zhao <[email protected]>

* fix example tests

Signed-off-by: Bugen Zhao <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>

* Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (apache#6043)

* deprecate auto copy, ask explicit reference

* update comments

* make cargo doc happy

* Make display of interval types more pretty (apache#6006)

* improve dispaly for interval.

* update test in pretty, and fix display problem.

* tmp

* fix tests in arrow-cast.

* fix tests in pretty.

* fix style.

* Update snafu (apache#5930)

* Update Parquet thrift generated structures (apache#6045)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (apache#5933)

This reverts commit 22e0b44.

* Revert "Update snafu (apache#5930)" (apache#6069)

This reverts commit 756b1fb.

* Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (apache#6075)

* Update pyo3 requirement from 0.21.1 to 0.22.1

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* refactor: remove deprecated `FromPyArrow::from_pyarrow`

"GIL Refs" are being phased out.

* chore: update `pyo3` in integration tests

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove repeated codes to make the codes more concise. (apache#6080)

* Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (apache#6068)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* add support for unencoded_byte_array_data_bytes

* add comments

* change sig of ColumnMetrics::update_variable_length_bytes()

* rename ParquetOffsetIndex to OffsetSizeIndex

* rename some functions

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add Default trait to ColumnMetrics as suggested in review

* rename OffsetSizeIndex to OffsetIndexMetaData

---------

Co-authored-by: Andrew Lamb <[email protected]>

* deprecate read_page_locations

* add level histograms to metadata

* add to_thrift() to OffsetIndexMetaData

* Update pyo3 requirement from 0.21.1 to 0.22.2 (apache#6085)

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.2)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (apache#6095)

* deprecate read_page_locations

* add to_thrift() to OffsetIndexMetaData

* move valid test into ColumnIndexBuilder::append_histograms

* move update_histogram() inside ColumnMetrics

* Update parquet/src/column/writer/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

* Implement LevelHistograms as a struct

* formatting

* fix error in docs

---------

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bugen Zhao <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Jesse <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Nick Cameron <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
…stent buffering (apache#6132)

* change ipc::reader and writer APIs for consistent buffering

Current writer API automatically wraps the supplied std::io::Writer
impl into a BufWriter.
It is cleaner and more idiomatic to have the default be using the
supplied impl directly, as the user might already have a BufWriter
or an impl that doesn't actually benefit from buffering at all.

StreamReader does a similar thing, but it also exposes a `try_new_unbuffered`
that bypasses the internal wrap.

Here we propose a consistent and non-buffered by default API:
- `try_new` does not wrap the passed reader/writer,
- `try_new_buffered` is a convenience function that does wrap
  the reader/writer into a BufReader/BufWriter,
- all four publicly exposed IPC reader/writers follow the above consistently,
  i.e. `StreamReader`, `FileReader`, `StreamWriter`, `FileWriter`.

Those are breaking changes.

An additional tweak: removed the generic type bounds from struct definitions
on the four types, as that is the idiomatic Rust approach (see e.g. stdlib's
HashMap that has no bounds on the struct definition, only the impl requires
Hash + Eq).

See apache#6099 for the discussion.

* improvements to docs in `arrow::ipc::reader` and `writer`

Applied a few suggestions, made `Error` sections more consistent.
* use LevelHistogram in PageIndex and ColumnIndexBuilder

* revert changes to OffsetIndexBuilder
* fix comparison kernel benchmarks

* add comment as suggested by @alamb
…der` (apache#6136)

* new block size growing strategy

* Update arrow-array/src/builder/generic_bytes_view_builder.rs

Co-authored-by: Andrew Lamb <[email protected]>

* update function name, deprecate old function

* update comments

---------

Co-authored-by: Andrew Lamb <[email protected]>
* improve "contains" performance

* add tests

* cargo fmt 😞

---------

Co-authored-by: Andrew Lamb <[email protected]>
…he#6118)

* improvements to "starts_with" and "ends_with"

* add tests and refactor slightly

* add comments
…pache#6127)

* Support construct BooleanArray from &[u8]

* fix doc

* add new_from_packed and new_from_u8; delete impl From<&[u8]> for BooleanArray and BooleanBuffer
* Update MSRV to 1.64

* Revert "clippy ignore"

This reverts commit 7a4b760.
…6169)

* Update FlightSql.proto to version 17.0

Adds new message CommandStatementIngest and removes `experimental` from
other messages.

* Regenerate flight sql protocol

This upgrades the file to version 17.0 of the protobuf definition.

Co-authored-by: Douglas Anderson <[email protected]>
@adriangb adriangb merged commit 4d7158f into adriangb:add-encode_metadata Aug 5, 2024
33 checks passed
@etseidl etseidl deleted the fix_checks_and_merge branch August 8, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants