From 97ad0ad77d89a66e24a435667d2d43f19bb8794d Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 12 Sep 2024 11:54:02 -0400 Subject: [PATCH] Improve comments on target user and unify summaries (#12418) --- README.md | 25 ++++++++++++++++++++++--- datafusion/core/src/lib.rs | 24 ++++++++++++++---------- docs/source/index.rst | 25 +++++++++++++++++-------- 3 files changed, 53 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 816dc77714d2..bb8526c24e2c 100644 --- a/README.md +++ b/README.md @@ -41,9 +41,28 @@ logo -Apache DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in -[Rust](http://rustlang.org), using the [Apache Arrow](https://arrow.apache.org) -in-memory format. [Python Bindings](https://github.com/apache/datafusion-python) are also available. DataFusion offers SQL and Dataframe APIs, excellent [performance](https://benchmark.clickhouse.com/), built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. +DataFusion is an extensible query engine written in [Rust] that +uses [Apache Arrow] as its in-memory format. DataFusion's target users are +developers building fast and feature rich database and analytic systems, +customized to particular workloads. See [use cases] for examples. + +"Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs, +excellent [performance], built-in support for CSV, Parquet, JSON, and Avro, +extensive customization, and a great community. +[Python Bindings] are also available. + +DataFusion features a full query planner, a columnar, streaming, multi-threaded, +vectorized execution engine, and partitioned data sources. You can +customize DataFusion at almost all points including additional data sources, +query languages, functions, custom operators and more. +See the [Architecture] section for more details. + +[rust]: http://rustlang.org +[apache arrow]: https://arrow.apache.org +[use cases]: https://datafusion.apache.org/user-guide/introduction.html#use-cases +[python bindings]: https://github.com/apache/datafusion-python +[performance]: https://benchmark.clickhouse.com/ +[architecture]: https://datafusion.apache.org/contributor-guide/architecture.html Here are links to some important information diff --git a/datafusion/core/src/lib.rs b/datafusion/core/src/lib.rs index 9c368415bb05..63d4fbc0bba5 100644 --- a/datafusion/core/src/lib.rs +++ b/datafusion/core/src/lib.rs @@ -17,24 +17,28 @@ #![warn(missing_docs, clippy::needless_borrow)] //! [DataFusion] is an extensible query engine written in Rust that -//! uses [Apache Arrow] as its in-memory format. DataFusion help developers -//! build fast and feature rich database and analytic systems, customized to -//! particular workloads. See [use cases] for examples +//! uses [Apache Arrow] as its in-memory format. DataFusion's target users are +//! developers building fast and feature rich database and analytic systems, +//! customized to particular workloads. See [use cases] for examples. //! -//! "Out of the box," DataFusion quickly runs complex [SQL] and -//! [`DataFrame`] queries using a full-featured query planner, a columnar, -//! streaming, multi-threaded, vectorized execution engine, and partitioned data -//! sources (Parquet, CSV, JSON, and Avro). +//! "Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs, +//! excellent [performance], built-in support for CSV, Parquet, JSON, and Avro, +//! extensive customization, and a great community. +//! [Python Bindings] are also available. //! -//! DataFusion is designed for easy customization such as -//! additional data sources, query languages, functions, custom -//! operators and more. See the [Architecture] section for more details. +//! DataFusion features a full query planner, a columnar, streaming, multi-threaded, +//! vectorized execution engine, and partitioned data sources. You can +//! customize DataFusion at almost all points including additional data sources, +//! query languages, functions, custom operators and more. +//! See the [Architecture] section below for more details. //! //! [DataFusion]: https://datafusion.apache.org/ //! [Apache Arrow]: https://arrow.apache.org //! [use cases]: https://datafusion.apache.org/user-guide/introduction.html#use-cases //! [SQL]: https://datafusion.apache.org/user-guide/sql/index.html //! [`DataFrame`]: dataframe::DataFrame +//! [performance]: https://benchmark.clickhouse.com/ +//! [Python Bindings]: https://github.com/apache/datafusion-python //! [Architecture]: #architecture //! //! # Examples diff --git a/docs/source/index.rst b/docs/source/index.rst index bb5ea430a321..4c67e808a4dd 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -32,14 +32,23 @@ Apache DataFusion Fork

-DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in -`Rust `_, using the `Apache Arrow `_ -in-memory format. - -DataFusion offers SQL and Dataframe APIs, excellent -`performance `_, built-in support for -CSV, Parquet, JSON, and Avro, extensive customization, and a great -community. + +DataFusion is an extensible query engine written in `Rust `_ that +uses `Apache Arrow `_ as its in-memory format. DataFusion's target users are +developers building fast and feature rich database and analytic systems, +customized to particular workloads. See `use cases `_ for examples. + +"Out of the box," DataFusion offers `SQL `_ +and `Dataframe `_ APIs, +excellent `performance `_, built-in support for CSV, Parquet, JSON, and Avro, +extensive customization, and a great community. +`Python Bindings `_ are also available. + +DataFusion features a full query planner, a columnar, streaming, multi-threaded, +vectorized execution engine, and partitioned data sources. You can +customize DataFusion at almost all points including additional data sources, +query languages, functions, custom operators and more. +See the `Architecture `_ section for more details. To get started, see