Skip to content

Commit

Permalink
docs(rust): Minor doc fixes and cleanup (pola-rs#19935)
Browse files Browse the repository at this point in the history
  • Loading branch information
lukemanley authored Nov 23, 2024
1 parent 5eeb369 commit 54a112d
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 46 deletions.
21 changes: 10 additions & 11 deletions crates/polars/src/docs/eager.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//!
//! # Polars Eager cookbook
//!
//! This page should serve a cookbook to quickly get you started with most fundamental operations
//! This page should serve as a cookbook to quickly get you started with most fundamental operations
//! executed on a [`ChunkedArray`], [`Series`] or [`DataFrame`].
//!
//! [`ChunkedArray`]: crate::chunked_array::ChunkedArray
Expand All @@ -23,7 +23,7 @@
//! * [Sort](#sort)
//! * [Joins](#joins)
//! * [GroupBy](#group_by)
//! - [pivot](#pivot)
//! * [pivot](#pivot)
//! * [Unpivot](#unpivot)
//! * [Explode](#explode)
//! * [IO](#io)
Expand All @@ -37,7 +37,7 @@
//! - [Replace NaN with Missing](#replace-nan-with-missing)
//! - [Extracting data](#extracting-data)
//!
//! ## Creation of Data structures
//! ## Creation of data structures
//!
//! ### ChunkedArray
//!
Expand Down Expand Up @@ -134,8 +134,8 @@
//! # }
//! ```
//!
//! Because Rusts Orphan Rule doesn't allow use to implement left side operations, we need to call
//! such operation directly.
//! Because Rust's Orphan Rule doesn't allow us to implement left side operations, we need to call
//! such operations directly.
//!
//! ```rust
//! # use polars::prelude::*;
Expand All @@ -148,7 +148,7 @@
//! let subtract_one_by_s = 1.sub(&series);
//! ```
//!
//! For [`ChunkedArray`] this left hand side operations can be done with the [`apply_values`] method.
//! For [`ChunkedArray`] left hand side operations can be done with the [`apply_values`] method.
//!
//! [`apply_values`]: crate::chunked_array::ops::ChunkApply::apply_values
//!
Expand Down Expand Up @@ -286,7 +286,7 @@
//! .zip(b.into_iter())
//! .map(|(opt_a, opt_b)| match (opt_a, opt_b) {
//! (Some(a), Some(b)) => Some(my_black_box_function(a, b)),
//! // if any of the two value is `None` we propagate that null
//! // if either value is `None` we propagate that null
//! _ => None,
//! })
//! .collect()
Expand Down Expand Up @@ -575,7 +575,7 @@
//!
//! # fn example(df: &DataFrame) -> PolarsResult<()> {
//! // read from path
//! let mut file = std::fs::File::open("iris_csv")?;
//! let mut file = std::fs::File::open("iris.csv")?;
//! let df = CsvReader::new(file).finish()?;
//! # Ok(())
//! # }
Expand Down Expand Up @@ -697,9 +697,8 @@
//!
//! ## Extracting data
//!
//! To be able to extract data out of [`Series`], either by iterating over them or converting them
//! to other datatypes like a [`Vec<T>`], we first need to downcast them to a [`ChunkedArray<T>`]. This
//! is needed because we don't know the data type that is hold by the [`Series`].
//! To iterate over the values of a [`Series`], or to convert the [`Series`] into another structure
//! such as a [`Vec<T>`], we must first downcast to a data type aware [`ChunkedArray<T>`].
//!
//! [`ChunkedArray<T>`]: crate::chunked_array::ChunkedArray
//!
Expand Down
2 changes: 1 addition & 1 deletion crates/polars/src/docs/lazy.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//!
//! # Polars Lazy cookbook
//!
//! This page should serve a cookbook to quickly get you started with polars' query engine.
//! This page should serve as a cookbook to quickly get you started with Polars' query engine.
//! The lazy API allows you to create complex well performing queries on top of Polars eager.
//!
//! ## Tree Of Contents
Expand Down
66 changes: 32 additions & 34 deletions crates/polars/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
//! # Polars: *<small>DataFrames in Rust</small>*
//!
//! Polars is a DataFrame library for Rust. It is based on [Apache Arrow](https://arrow.apache.org/)'s memory model.
//! Apache arrow provides very cache efficient columnar data structures and is becoming the defacto
//! standard for columnar data.
//! Apache Arrow provides very cache efficient columnar data structures and is becoming the defacto
//! standard forcolumnar data.
//!
//! ## Quickstart
//! We recommend to build your queries directly with [polars-lazy]. This allows you to combine
//! expression into powerful aggregations and column selections. All expressions are evaluated
//! in parallel and your queries are optimized just in time.
//! We recommend building queries directly with [polars-lazy]. This allows you to combine
//! expressions into powerful aggregations and column selections. All expressions are evaluated
//! in parallel and queries are optimized just in time.
//!
//! [polars-lazy]: polars_lazy
//!
Expand Down Expand Up @@ -74,19 +74,17 @@
//! [`ChunkedArray<T>`]: crate::chunked_array::ChunkedArray
//!
//! ### DataFrame
//! A [`DataFrame`] is a 2 dimensional data structure that is backed by a [`Series`], and it could be
//! seen as an abstraction on [`Vec<Series>`]. Operations that can be executed on [`DataFrame`] are very
//! A [`DataFrame`] is a two-dimensional data structure backed by a [`Series`] and can be
//! seen as an abstraction on [`Vec<Series>`]. Operations that can be executed on a [`DataFrame`] are
//! similar to what is done in a `SQL` like query. You can `GROUP`, `JOIN`, `PIVOT` etc.
//!
//! [`Vec<Series>`]: std::vec::Vec
//!
//! ### Series
//! [`Series`] are the type agnostic columnar data representation of Polars. They provide many
//! operations out of the box, many via the [`Series`] series and
//! [`SeriesTrait`] trait. Whether or not an operation is provided
//! by a [`Series`] is determined by the operation. If the operation can be done without knowing the
//! underlying columnar type, this operation probably is provided by the [`Series`]. If not, you must
//! downcast to the typed data structure that is wrapped by the [`Series`]. That is the [`ChunkedArray<T>`].
//! [`Series`] are the type-agnostic columnar data representation of Polars. The [`Series`] struct and
//! [`SeriesTrait`] trait provide many operations out of the box. Most type-agnostic operations are provided
//! by [`Series`]. Type-aware operations require downcasting to the typed data structure that is wrapped
//! by the [`Series`]. The underlying typed data structure is a [`ChunkedArray<T>`].
//!
//! [`SeriesTrait`]: crate::series::SeriesTrait
//!
Expand Down Expand Up @@ -123,7 +121,7 @@
//!
//! `col("foo").sort().head(2)`
//!
//! The snippet above says select column `"foo"` then sort this column and then take first 2 values
//! The snippet above says select column `"foo"` then sort this column and then take the first 2 values
//! of the sorted output.
//! The power of expressions is that every expression produces a new expression and that they can
//! be piped together.
Expand All @@ -143,10 +141,10 @@
//! # Ok(())
//! # }
//! ```
//! All expressions are ran in parallel, meaning that separate polars expressions are embarrassingly parallel.
//! All expressions are run in parallel, meaning that separate polars expressions are embarrassingly parallel.
//! (Note that within an expression there may be more parallelization going on).
//!
//! Understanding polars expressions is most important when starting with the polars library. Read more
//! Understanding Polars expressions is most important when starting with the Polars library. Read more
//! about them in the [user guide](https://docs.pola.rs/user-guide/concepts/expressions).
//!
//! ### Eager
Expand All @@ -171,30 +169,30 @@
//! * A lot of datatypes
//!
//! Both of these really put strain on compile times. To keep Polars lean, we make both **opt-in**,
//! meaning that you only pay the compilation cost, if you need it.
//! meaning that you only pay the compilation cost if you need it.
//!
//! ## Compile times and opt-in features
//! The opt-in features are (not including dtype features):
//!
//! * `performant` - Longer compile times more fast paths.
//! * `lazy` - Lazy API
//! - `regex` - Use regexes in [column selection]
//! - `dot_diagram` - Create dot diagrams from lazy logical plans.
//! * `sql` - Pass SQL queries to polars.
//! * `streaming` - Be able to process datasets that are larger than RAM.
//! * `sql` - Pass SQL queries to Polars.
//! * `streaming` - Process datasets larger than RAM.
//! * `random` - Generate arrays with randomly sampled values
//! * `ndarray`- Convert from [`DataFrame`] to [ndarray](https://docs.rs/ndarray/)
//! * `temporal` - Conversions between [Chrono](https://docs.rs/chrono/) and Polars for temporal data types
//! * `timezones` - Activate timezone support.
//! * `strings` - Extra string utilities for [`StringChunked`] //! - `string_pad` - `zfill`, `ljust`, `rjust`
//! * `strings` - Extra string utilities for [`StringChunked`]
//! - `string_pad` - `zfill`, `ljust`, `rjust`
//! - `string_to_integer` - `parse_int`
//! * `object` - Support for generic ChunkedArrays called [`ObjectChunked<T>`] (generic over `T`).
//! These are downcastable from Series through the [Any](https://doc.rust-lang.org/std/any/index.html) trait.
//! * Performance related:
//! - `nightly` - Several nightly only features such as SIMD and specialization.
//! - `performant` - more fast paths, slower compile times.
//! - `bigidx` - Activate this feature if you expect >> 2^32 rows. This has not been needed by anyone.
//! This allows polars to scale up way beyond that by using `u64` as an index.
//! - `bigidx` - Activate this feature if you expect >> 2^32 rows. This is rarely needed.
//! This allows Polars to scale up beyond 2^32 rows by using an index with a `u64` data type.
//! Polars will be a bit slower with this feature activated as many data structures
//! are less cache efficient.
//! - `cse` - Activate common subplan elimination optimization
Expand All @@ -208,8 +206,8 @@
//! - `ipc` - Arrow's IPC format serialization
//! - `decompress` - Automatically infer compression of csvs and decompress them.
//! Supported compressions:
//! * zip
//! * gzip
//! - zip
//! - gzip
//!
//! [`StringChunked`]: crate::datatypes::StringChunked
//! [column selection]: polars_lazy::dsl::col
Expand All @@ -221,7 +219,7 @@
//! Also activates rolling window group by operations.
//! - `sort_multiple` - Allow sorting a [`DataFrame`] on multiple columns
//! - `rows` - Create [`DataFrame`] from rows and extract rows from [`DataFrame`]s.
//! And activates `pivot` and `transpose` operations
//! Also activates `pivot` and `transpose` operations
//! - `asof_join` - Join ASOF, to join on nearest keys instead of exact equality match.
//! - `cross_join` - Create the Cartesian product of two [`DataFrame`]s.
//! - `semi_anti_join` - SEMI and ANTI joins.
Expand All @@ -232,8 +230,8 @@
//! * [`Series`]/[`Expr`] operations:
//! - `is_in` - Check for membership in [`Series`].
//! - `zip_with` - [Zip two Series/ ChunkedArrays](crate::chunked_array::ops::ChunkZip).
//! - `round_series` - round underlying float types of [`Series`].
//! - `repeat_by` - [Repeat element in an Array N times, where N is given by another array.
//! - `round_series` - Round underlying float types of [`Series`].
//! - `repeat_by` - Repeat element in an Array N times, where N is given by another array.
//! - `is_first_distinct` - Check if element is first unique value.
//! - `is_last_distinct` - Check if element is last unique value.
//! - `is_between` - Check if this expression is between the given lower and upper bounds.
Expand All @@ -245,12 +243,12 @@
//! - `mode` - [Return the most occurring value(s)](polars_ops::chunked_array::mode)
//! - `cum_agg` - [`cum_sum`], [`cum_min`], [`cum_max`] aggregation.
//! - `rolling_window` - rolling window functions, like [`rolling_mean`]
//! - `interpolate` [interpolate None values](polars_ops::series::interpolate())
//! - `interpolate` - [interpolate None values](polars_ops::series::interpolate())
//! - `extract_jsonpath` - [Run jsonpath queries on StringChunked](https://goessner.net/articles/JsonPath/)
//! - `list` - List utils.
//! - `list_gather` take sublist by multiple indices
//! - `rank` - Ranking algorithms.
//! - `moment` - kurtosis and skew statistics
//! - `moment` - Kurtosis and skew statistics
//! - `ewma` - Exponential moving average windows
//! - `abs` - Get absolute values of [`Series`].
//! - `arange` - Range operation on [`Series`].
Expand Down Expand Up @@ -288,7 +286,7 @@
//! ## Compile times and opt-in data types
//! As mentioned above, Polars [`Series`] are wrappers around
//! [`ChunkedArray<T>`] without the generic parameter `T`.
//! To get rid of the generic parameter, all the possible value of `T` are compiled
//! To get rid of the generic parameter, all the possible values of `T` are compiled
//! for [`Series`]. This gets more expensive the more types you want for a [`Series`]. In order to reduce
//! the compile times, we have decided to default to a minimal set of types and make more [`Series`] types
//! opt-in.
Expand All @@ -310,17 +308,17 @@
//! | Struct | dtype-struct |
//!
//!
//! Or you can choose on of the preconfigured pre-sets.
//! Or you can choose one of the preconfigured pre-sets.
//!
//! * `dtype-full` - all opt-in dtypes.
//! * `dtype-slim` - slim preset of opt-in dtypes.
//!
//! ## Performance
//! To gains most performance out of Polars we recommend compiling on a nightly compiler
//! To get the best performance out of Polars we recommend compiling on a nightly compiler
//! with the features `simd` and `performant` activated. The activated cpu features also influence
//! the amount of simd acceleration we can use.
//!
//! See this the features we activate for our python builds, or if you just run locally and want to
//! See the features we activate for our python builds, or if you just run locally and want to
//! use all available features on your cpu, set `RUSTFLAGS='-C target-cpu=native'`.
//!
//! ### Custom allocator
Expand Down

0 comments on commit 54a112d

Please sign in to comment.