diff --git a/crates/polars/src/docs/eager.rs b/crates/polars/src/docs/eager.rs index 6d3a6e90ea4c..d9b00886d5af 100644 --- a/crates/polars/src/docs/eager.rs +++ b/crates/polars/src/docs/eager.rs @@ -1,7 +1,7 @@ //! //! # Polars Eager cookbook //! -//! This page should serve a cookbook to quickly get you started with most fundamental operations +//! This page should serve as a cookbook to quickly get you started with most fundamental operations //! executed on a [`ChunkedArray`], [`Series`] or [`DataFrame`]. //! //! [`ChunkedArray`]: crate::chunked_array::ChunkedArray @@ -23,7 +23,7 @@ //! * [Sort](#sort) //! * [Joins](#joins) //! * [GroupBy](#group_by) -//! - [pivot](#pivot) +//! * [pivot](#pivot) //! * [Unpivot](#unpivot) //! * [Explode](#explode) //! * [IO](#io) @@ -37,7 +37,7 @@ //! - [Replace NaN with Missing](#replace-nan-with-missing) //! - [Extracting data](#extracting-data) //! -//! ## Creation of Data structures +//! ## Creation of data structures //! //! ### ChunkedArray //! @@ -134,8 +134,8 @@ //! # } //! ``` //! -//! Because Rusts Orphan Rule doesn't allow use to implement left side operations, we need to call -//! such operation directly. +//! Because Rust's Orphan Rule doesn't allow us to implement left side operations, we need to call +//! such operations directly. //! //! ```rust //! # use polars::prelude::*; @@ -148,7 +148,7 @@ //! let subtract_one_by_s = 1.sub(&series); //! ``` //! -//! For [`ChunkedArray`] this left hand side operations can be done with the [`apply_values`] method. +//! For [`ChunkedArray`] left hand side operations can be done with the [`apply_values`] method. //! //! [`apply_values`]: crate::chunked_array::ops::ChunkApply::apply_values //! @@ -286,7 +286,7 @@ //! .zip(b.into_iter()) //! .map(|(opt_a, opt_b)| match (opt_a, opt_b) { //! (Some(a), Some(b)) => Some(my_black_box_function(a, b)), -//! // if any of the two value is `None` we propagate that null +//! // if either value is `None` we propagate that null //! _ => None, //! }) //! .collect() @@ -575,7 +575,7 @@ //! //! # fn example(df: &DataFrame) -> PolarsResult<()> { //! // read from path -//! let mut file = std::fs::File::open("iris_csv")?; +//! let mut file = std::fs::File::open("iris.csv")?; //! let df = CsvReader::new(file).finish()?; //! # Ok(()) //! # } @@ -697,9 +697,8 @@ //! //! ## Extracting data //! -//! To be able to extract data out of [`Series`], either by iterating over them or converting them -//! to other datatypes like a [`Vec`], we first need to downcast them to a [`ChunkedArray`]. This -//! is needed because we don't know the data type that is hold by the [`Series`]. +//! To iterate over the values of a [`Series`], or to convert the [`Series`] into another structure +//! such as a [`Vec`], we must first downcast to a data type aware [`ChunkedArray`]. //! //! [`ChunkedArray`]: crate::chunked_array::ChunkedArray //! diff --git a/crates/polars/src/docs/lazy.rs b/crates/polars/src/docs/lazy.rs index bfaa6ebd2569..161aafc29eeb 100644 --- a/crates/polars/src/docs/lazy.rs +++ b/crates/polars/src/docs/lazy.rs @@ -1,7 +1,7 @@ //! //! # Polars Lazy cookbook //! -//! This page should serve a cookbook to quickly get you started with polars' query engine. +//! This page should serve as a cookbook to quickly get you started with Polars' query engine. //! The lazy API allows you to create complex well performing queries on top of Polars eager. //! //! ## Tree Of Contents diff --git a/crates/polars/src/lib.rs b/crates/polars/src/lib.rs index dba9bb39d46d..b04ce28bb056 100644 --- a/crates/polars/src/lib.rs +++ b/crates/polars/src/lib.rs @@ -1,13 +1,13 @@ //! # Polars: *DataFrames in Rust* //! //! Polars is a DataFrame library for Rust. It is based on [Apache Arrow](https://arrow.apache.org/)'s memory model. -//! Apache arrow provides very cache efficient columnar data structures and is becoming the defacto -//! standard for columnar data. +//! Apache Arrow provides very cache efficient columnar data structures and is becoming the defacto +//! standard forcolumnar data. //! //! ## Quickstart -//! We recommend to build your queries directly with [polars-lazy]. This allows you to combine -//! expression into powerful aggregations and column selections. All expressions are evaluated -//! in parallel and your queries are optimized just in time. +//! We recommend building queries directly with [polars-lazy]. This allows you to combine +//! expressions into powerful aggregations and column selections. All expressions are evaluated +//! in parallel and queries are optimized just in time. //! //! [polars-lazy]: polars_lazy //! @@ -74,19 +74,17 @@ //! [`ChunkedArray`]: crate::chunked_array::ChunkedArray //! //! ### DataFrame -//! A [`DataFrame`] is a 2 dimensional data structure that is backed by a [`Series`], and it could be -//! seen as an abstraction on [`Vec`]. Operations that can be executed on [`DataFrame`] are very +//! A [`DataFrame`] is a two-dimensional data structure backed by a [`Series`] and can be +//! seen as an abstraction on [`Vec`]. Operations that can be executed on a [`DataFrame`] are //! similar to what is done in a `SQL` like query. You can `GROUP`, `JOIN`, `PIVOT` etc. //! //! [`Vec`]: std::vec::Vec //! //! ### Series -//! [`Series`] are the type agnostic columnar data representation of Polars. They provide many -//! operations out of the box, many via the [`Series`] series and -//! [`SeriesTrait`] trait. Whether or not an operation is provided -//! by a [`Series`] is determined by the operation. If the operation can be done without knowing the -//! underlying columnar type, this operation probably is provided by the [`Series`]. If not, you must -//! downcast to the typed data structure that is wrapped by the [`Series`]. That is the [`ChunkedArray`]. +//! [`Series`] are the type-agnostic columnar data representation of Polars. The [`Series`] struct and +//! [`SeriesTrait`] trait provide many operations out of the box. Most type-agnostic operations are provided +//! by [`Series`]. Type-aware operations require downcasting to the typed data structure that is wrapped +//! by the [`Series`]. The underlying typed data structure is a [`ChunkedArray`]. //! //! [`SeriesTrait`]: crate::series::SeriesTrait //! @@ -123,7 +121,7 @@ //! //! `col("foo").sort().head(2)` //! -//! The snippet above says select column `"foo"` then sort this column and then take first 2 values +//! The snippet above says select column `"foo"` then sort this column and then take the first 2 values //! of the sorted output. //! The power of expressions is that every expression produces a new expression and that they can //! be piped together. @@ -143,10 +141,10 @@ //! # Ok(()) //! # } //! ``` -//! All expressions are ran in parallel, meaning that separate polars expressions are embarrassingly parallel. +//! All expressions are run in parallel, meaning that separate polars expressions are embarrassingly parallel. //! (Note that within an expression there may be more parallelization going on). //! -//! Understanding polars expressions is most important when starting with the polars library. Read more +//! Understanding Polars expressions is most important when starting with the Polars library. Read more //! about them in the [user guide](https://docs.pola.rs/user-guide/concepts/expressions). //! //! ### Eager @@ -171,30 +169,30 @@ //! * A lot of datatypes //! //! Both of these really put strain on compile times. To keep Polars lean, we make both **opt-in**, -//! meaning that you only pay the compilation cost, if you need it. +//! meaning that you only pay the compilation cost if you need it. //! //! ## Compile times and opt-in features //! The opt-in features are (not including dtype features): //! -//! * `performant` - Longer compile times more fast paths. //! * `lazy` - Lazy API //! - `regex` - Use regexes in [column selection] //! - `dot_diagram` - Create dot diagrams from lazy logical plans. -//! * `sql` - Pass SQL queries to polars. -//! * `streaming` - Be able to process datasets that are larger than RAM. +//! * `sql` - Pass SQL queries to Polars. +//! * `streaming` - Process datasets larger than RAM. //! * `random` - Generate arrays with randomly sampled values //! * `ndarray`- Convert from [`DataFrame`] to [ndarray](https://docs.rs/ndarray/) //! * `temporal` - Conversions between [Chrono](https://docs.rs/chrono/) and Polars for temporal data types //! * `timezones` - Activate timezone support. -//! * `strings` - Extra string utilities for [`StringChunked`] //! - `string_pad` - `zfill`, `ljust`, `rjust` +//! * `strings` - Extra string utilities for [`StringChunked`] +//! - `string_pad` - `zfill`, `ljust`, `rjust` //! - `string_to_integer` - `parse_int` //! * `object` - Support for generic ChunkedArrays called [`ObjectChunked`] (generic over `T`). //! These are downcastable from Series through the [Any](https://doc.rust-lang.org/std/any/index.html) trait. //! * Performance related: //! - `nightly` - Several nightly only features such as SIMD and specialization. //! - `performant` - more fast paths, slower compile times. -//! - `bigidx` - Activate this feature if you expect >> 2^32 rows. This has not been needed by anyone. -//! This allows polars to scale up way beyond that by using `u64` as an index. +//! - `bigidx` - Activate this feature if you expect >> 2^32 rows. This is rarely needed. +//! This allows Polars to scale up beyond 2^32 rows by using an index with a `u64` data type. //! Polars will be a bit slower with this feature activated as many data structures //! are less cache efficient. //! - `cse` - Activate common subplan elimination optimization @@ -208,8 +206,8 @@ //! - `ipc` - Arrow's IPC format serialization //! - `decompress` - Automatically infer compression of csvs and decompress them. //! Supported compressions: -//! * zip -//! * gzip +//! - zip +//! - gzip //! //! [`StringChunked`]: crate::datatypes::StringChunked //! [column selection]: polars_lazy::dsl::col @@ -221,7 +219,7 @@ //! Also activates rolling window group by operations. //! - `sort_multiple` - Allow sorting a [`DataFrame`] on multiple columns //! - `rows` - Create [`DataFrame`] from rows and extract rows from [`DataFrame`]s. -//! And activates `pivot` and `transpose` operations +//! Also activates `pivot` and `transpose` operations //! - `asof_join` - Join ASOF, to join on nearest keys instead of exact equality match. //! - `cross_join` - Create the Cartesian product of two [`DataFrame`]s. //! - `semi_anti_join` - SEMI and ANTI joins. @@ -232,8 +230,8 @@ //! * [`Series`]/[`Expr`] operations: //! - `is_in` - Check for membership in [`Series`]. //! - `zip_with` - [Zip two Series/ ChunkedArrays](crate::chunked_array::ops::ChunkZip). -//! - `round_series` - round underlying float types of [`Series`]. -//! - `repeat_by` - [Repeat element in an Array N times, where N is given by another array. +//! - `round_series` - Round underlying float types of [`Series`]. +//! - `repeat_by` - Repeat element in an Array N times, where N is given by another array. //! - `is_first_distinct` - Check if element is first unique value. //! - `is_last_distinct` - Check if element is last unique value. //! - `is_between` - Check if this expression is between the given lower and upper bounds. @@ -245,12 +243,12 @@ //! - `mode` - [Return the most occurring value(s)](polars_ops::chunked_array::mode) //! - `cum_agg` - [`cum_sum`], [`cum_min`], [`cum_max`] aggregation. //! - `rolling_window` - rolling window functions, like [`rolling_mean`] -//! - `interpolate` [interpolate None values](polars_ops::series::interpolate()) +//! - `interpolate` - [interpolate None values](polars_ops::series::interpolate()) //! - `extract_jsonpath` - [Run jsonpath queries on StringChunked](https://goessner.net/articles/JsonPath/) //! - `list` - List utils. //! - `list_gather` take sublist by multiple indices //! - `rank` - Ranking algorithms. -//! - `moment` - kurtosis and skew statistics +//! - `moment` - Kurtosis and skew statistics //! - `ewma` - Exponential moving average windows //! - `abs` - Get absolute values of [`Series`]. //! - `arange` - Range operation on [`Series`]. @@ -288,7 +286,7 @@ //! ## Compile times and opt-in data types //! As mentioned above, Polars [`Series`] are wrappers around //! [`ChunkedArray`] without the generic parameter `T`. -//! To get rid of the generic parameter, all the possible value of `T` are compiled +//! To get rid of the generic parameter, all the possible values of `T` are compiled //! for [`Series`]. This gets more expensive the more types you want for a [`Series`]. In order to reduce //! the compile times, we have decided to default to a minimal set of types and make more [`Series`] types //! opt-in. @@ -310,17 +308,17 @@ //! | Struct | dtype-struct | //! //! -//! Or you can choose on of the preconfigured pre-sets. +//! Or you can choose one of the preconfigured pre-sets. //! //! * `dtype-full` - all opt-in dtypes. //! * `dtype-slim` - slim preset of opt-in dtypes. //! //! ## Performance -//! To gains most performance out of Polars we recommend compiling on a nightly compiler +//! To get the best performance out of Polars we recommend compiling on a nightly compiler //! with the features `simd` and `performant` activated. The activated cpu features also influence //! the amount of simd acceleration we can use. //! -//! See this the features we activate for our python builds, or if you just run locally and want to +//! See the features we activate for our python builds, or if you just run locally and want to //! use all available features on your cpu, set `RUSTFLAGS='-C target-cpu=native'`. //! //! ### Custom allocator