-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve repository readme #5752
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,31 +17,38 @@ | |
under the License. | ||
--> | ||
|
||
# Native Rust implementation of Apache Arrow and Parquet | ||
# Native Rust implementation of Apache Arrow and Apache Parquet | ||
|
||
[![Coverage Status](https://codecov.io/gh/apache/arrow-rs/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow-rs?branch=master) | ||
|
||
Welcome to the implementation of Arrow, the popular in-memory columnar format, in [Rust][rust]. | ||
Welcome to the [Rust][rust] implementation of [Apache Arrow], the popular in-memory columnar format. | ||
|
||
This repo contains the following main components: | ||
|
||
| Crate | Description | Latest API Docs | README | | ||
| ------------ | ------------------------------------------------------------------------- | ---------------------------------------------- | ------------------------------ | | ||
| arrow | Core functionality (memory layout, arrays, low level computations) | [docs.rs](https://docs.rs/arrow/latest) | [(README)][arrow-readme] | | ||
| parquet | Support for Parquet columnar file format | [docs.rs](https://docs.rs/parquet/latest) | [(README)][parquet-readme] | | ||
| arrow-flight | Support for Arrow-Flight IPC protocol | [docs.rs](https://docs.rs/arrow-flight/latest) | [(README)][flight-readme] | | ||
| object-store | Support for object store interactions (aws, azure, gcp, local, in-memory) | [docs.rs](https://docs.rs/object_store/latest) | [(README)][objectstore-readme] | | ||
| Crate | Description | Latest API Docs | README | | ||
| ---------------- | --------------------------------------------------------- | ---------------------------------------------- | ------------------------------ | | ||
| [`arrow`] | Core Arrow functionality (memory layout, arrays, kernels) | [docs.rs](https://docs.rs/arrow/latest) | [(README)][arrow-readme] | | ||
| [`parquet`] | Parquet columnar file format | [docs.rs](https://docs.rs/parquet/latest) | [(README)][parquet-readme] | | ||
| [`arrow-flight`] | Arrow-Flight IPC protocol | [docs.rs](https://docs.rs/arrow-flight/latest) | [(README)][flight-readme] | | ||
| [`object-store`] | object store (aws, azure, gcp, local, in-memory) | [docs.rs](https://docs.rs/object_store/latest) | [(README)][objectstore-readme] | | ||
|
||
The current development version the API documentation in this repo can be found [here](https://arrow.apache.org/rust). | ||
|
||
[apache arrow]: https://arrow.apache.org/ | ||
[`arrow`]: https://crates.io/crates/arrow | ||
[`parquet`]: https://crates.io/crates/parquet | ||
[`parquet-derive`]: https://crates.io/crates/parquet-derive | ||
[`arrow-flight`]: https://crates.io/crates/arrow-flight | ||
[`object-store`]: https://crates.io/crates/object-store | ||
|
||
## Release Versioning and Schedule | ||
|
||
The Arrow Rust project releases approximately monthly and follows [Semantic | ||
Versioning](https://semver.org/). | ||
|
||
Due to available maintainer and testing bandwidth, `arrow` crates (`arrow`, | ||
`arrow-flight`, etc.) are released on the same schedule with the same versions | ||
as the `parquet` and `parquet-derive` crates. | ||
Due to available maintainer and testing bandwidth, [`arrow`] crates ([`arrow`], | ||
[`arrow-flight`], etc.) are released on the same schedule with the same versions | ||
as the [`parquet`] and [`parquet-derive`] crates. | ||
|
||
Starting June 2024, we plan to release new major versions with potentially | ||
breaking API changes at most once a quarter, and release incremental minor versions in | ||
|
@@ -62,22 +69,30 @@ For example: | |
|
||
There are two related crates in different repositories | ||
|
||
| Crate | Description | Documentation | | ||
| ---------- | --------------------------------------- | ----------------------------- | | ||
| DataFusion | In-memory query engine with SQL support | [(README)][datafusion-readme] | | ||
| Ballista | Distributed query execution | [(README)][ballista-readme] | | ||
| Crate | Description | Documentation | | ||
| -------------- | --------------------------------------- | ----------------------------- | | ||
| [`datafusion`] | In-memory query engine with SQL support | [(README)][datafusion-readme] | | ||
| [`ballista`] | Distributed query execution | [(README)][ballista-readme] | | ||
|
||
[`datafusion`]: https://crates.io/crates/datafusion | ||
[`ballista`]: https://crates.io/crates/datafusion-ballista | ||
|
||
Collectively, these crates support a vast array of functionality for analytic computations in Rust. | ||
Collectively, these crates support a wider array of functionality for analytic computations in Rust. | ||
|
||
For example, you can write an SQL query or a `DataFrame` (using the `datafusion` crate), run it against a parquet file (using the `parquet` crate), evaluate it in-memory using Arrow's columnar format (using the `arrow` crate), and send to another process (using the `arrow-flight` crate). | ||
For example, you can write SQL queries or a `DataFrame` (using the | ||
[`datafusion`] crate) to read a parquet file (using the [`parquet`] crate), | ||
evaluate it in-memory using Arrow's columnar format (using the [`arrow`] crate), | ||
and send to another process (using the [`arrow-flight`] crate). | ||
|
||
Generally speaking, the `arrow` crate offers functionality for using Arrow arrays, and `datafusion` offers most operations typically found in SQL, including `join`s and window functions. | ||
Generally speaking, the [`arrow`] crate offers functionality for using Arrow | ||
arrays, and [`datafusion`] offers most operations typically found in SQL, | ||
including `join`s and window functions. | ||
|
||
You can find more details about each crate in their respective READMEs. | ||
|
||
## Arrow Rust Community | ||
|
||
The `[email protected]` mailing list serves as the core communication channel for the Arrow community. Instructions for signing up and links to the archives can be found at the [Arrow Community](https://arrow.apache.org/community/) page. All major announcements and communications happen there. | ||
The `[email protected]` mailing list serves as the core communication channel for the Arrow community. Instructions for signing up and links to the archives can be found on the [Arrow Community](https://arrow.apache.org/community/) page. All major announcements and communications happen there. | ||
|
||
The Rust Arrow community also uses the official [ASF Slack](https://s.apache.org/slack-invite) for informal discussions and coordination. This is | ||
a great place to meet other contributors and get guidance on where to contribute. Join us in the `#arrow-rust` channel and feel free to ask for an invite via: | ||
|
@@ -98,8 +113,8 @@ There is more information in the [contributing] guide. | |
[contributing]: CONTRIBUTING.md | ||
[parquet-readme]: parquet/README.md | ||
[flight-readme]: arrow-flight/README.md | ||
[datafusion-readme]: https://github.com/apache/arrow-datafusion/blob/main/README.md | ||
[ballista-readme]: https://github.com/apache/arrow-ballista/blob/main/README.md | ||
[datafusion-readme]: https://github.com/apache/datafusion/blob/main/README.md | ||
[ballista-readme]: https://github.com/apache/datafusion-ballista/blob/main/README.md | ||
[objectstore-readme]: object_store/README.md | ||
[issues]: https://github.com/apache/arrow-rs/issues | ||
[discussions]: https://github.com/apache/arrow-rs/discussions |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, do you think it's a good idea to add
object_store_opendal
here as an integration toobject_store
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea to me -- is it mature enough to steer people to? The crates.io page makes it unclear
https://crates.io/crates/object_store_opendal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I will improve this part and raising a PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thank you