-
Notifications
You must be signed in to change notification settings - Fork 609
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(pandas): remove the pandas backend
BREAKING CHANGE: The `pandas` backend is removed. Note that **pandas DataFrames are STILL VALID INPUTS AND OUTPUTS** and will remain so for the foreseeable future. Please use one of the other local backends like DuckDB, Polars, or DataFusion to perform operations directly on pandas DataFrames.
- Loading branch information
Showing
58 changed files
with
264 additions
and
7,503 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,213 +1,7 @@ | ||
# pandas | ||
|
||
[https://pandas.pydata.org/](https://pandas.pydata.org/) | ||
|
||
![](https://img.shields.io/badge/memtables-native-green?style=flat-square) ![](https://img.shields.io/badge/inputs-CSV | Parquet-blue?style=flat-square) ![](https://img.shields.io/badge/outputs-CSV | pandas | Parquet | PyArrow-orange?style=flat-square) | ||
|
||
::: {.callout-warning} | ||
## The Pandas backend is slated for removal in Ibis 10.0 | ||
We recommend using one of our other backends. | ||
|
||
Many workloads work well on the DuckDB and Polars backends, for example. | ||
::: | ||
|
||
|
||
## Install | ||
|
||
Install Ibis and dependencies for the pandas backend: | ||
|
||
::: {.panel-tabset} | ||
|
||
## `pip` | ||
|
||
Install with the `pandas` extra: | ||
|
||
```{.bash} | ||
pip install 'ibis-framework[pandas]' | ||
``` | ||
|
||
And connect: | ||
|
||
```{.python} | ||
import ibis | ||
con = ibis.pandas.connect() # <1> | ||
``` | ||
|
||
1. Adjust connection parameters as needed. | ||
|
||
## `conda` | ||
|
||
Install for pandas: | ||
|
||
```{.bash} | ||
conda install -c conda-forge ibis-pandas | ||
``` | ||
|
||
And connect: | ||
|
||
```{.python} | ||
import ibis | ||
con = ibis.pandas.connect() # <1> | ||
``` | ||
|
||
1. Adjust connection parameters as needed. | ||
|
||
## `mamba` | ||
|
||
Install for pandas: | ||
|
||
```{.bash} | ||
mamba install -c conda-forge ibis-pandas | ||
``` | ||
|
||
And connect: | ||
|
||
```{.python} | ||
import ibis | ||
con = ibis.pandas.connect() # <1> | ||
``` | ||
|
||
1. Adjust connection parameters as needed. | ||
::: {.callout-note} | ||
## The pandas backend was removed in Ibis version 10.0 | ||
|
||
See [our blog post](../posts/farewell-pandas/index.qmd) on the topic for more information. | ||
::: | ||
|
||
|
||
|
||
## User Defined functions (UDF) | ||
|
||
Ibis supports defining three kinds of user-defined functions for operations on | ||
expressions targeting the pandas backend: **element-wise**, **reduction**, and | ||
**analytic**. | ||
|
||
### Elementwise Functions | ||
|
||
An **element-wise** function is a function that takes N rows as input and | ||
produces N rows of output. `log`, `exp`, and `floor` are examples of | ||
element-wise functions. | ||
|
||
Here's how to define an element-wise function: | ||
|
||
```python | ||
import ibis.expr.datatypes as dt | ||
from ibis.backends.pandas.udf import udf | ||
|
||
@udf.elementwise(input_type=[dt.int64], output_type=dt.double) | ||
def add_one(x): | ||
return x + 1.0 | ||
``` | ||
|
||
### Reduction Functions | ||
|
||
A **reduction** is a function that takes N rows as input and produces 1 row | ||
as output. `sum`, `mean` and `count` are examples of reductions. In | ||
the context of a `GROUP BY`, reductions produce 1 row of output _per | ||
group_. | ||
|
||
Here's how to define a reduction function: | ||
|
||
```python | ||
import ibis.expr.datatypes as dt | ||
from ibis.backends.pandas.udf import udf | ||
|
||
@udf.reduction(input_type=[dt.double], output_type=dt.double) | ||
def double_mean(series): | ||
return 2 * series.mean() | ||
``` | ||
|
||
### Analytic Functions | ||
|
||
An **analytic** function is like an **element-wise** function in that it takes | ||
N rows as input and produces N rows of output. The key difference is that | ||
analytic functions can be applied _per group_ using window functions. Z-score | ||
is an example of an analytic function. | ||
|
||
Here's how to define an analytic function: | ||
|
||
```python | ||
import ibis.expr.datatypes as dt | ||
from ibis.backends.pandas.udf import udf | ||
|
||
@udf.analytic(input_type=[dt.double], output_type=dt.double) | ||
def zscore(series): | ||
return (series - series.mean()) / series.std() | ||
``` | ||
|
||
### Details of pandas UDFs | ||
|
||
- Element-wise provide support | ||
for applying your UDF to any combination of scalar values and columns. | ||
- Reductions provide support for | ||
whole column aggregations, grouped aggregations, and application of your | ||
function over a window. | ||
- Analytic functions work in both grouped and non-grouped | ||
settings | ||
- The objects you receive as input arguments are either `pandas.Series` or | ||
Python/NumPy scalars. | ||
|
||
::: {.callout-warning} | ||
## Keyword arguments must be given a default | ||
|
||
Any keyword arguments must be given a default value or the function **will | ||
not work**. | ||
::: | ||
|
||
A common Python convention is to set the default value to `None` and | ||
handle setting it to something not `None` in the body of the function. | ||
|
||
Using `add_one` from above as an example, the following call will receive a | ||
`pandas.Series` for the `x` argument: | ||
|
||
```python | ||
import ibis | ||
import pandas as pd | ||
df = pd.DataFrame({'a': [1, 2, 3]}) | ||
con = ibis.pandas.connect({'df': df}) | ||
t = con.table('df') | ||
expr = add_one(t.a) | ||
expr | ||
``` | ||
|
||
And this will receive the `int` 1: | ||
|
||
```python | ||
expr = add_one(1) | ||
expr | ||
``` | ||
|
||
Since the pandas backend passes around `**kwargs` you can accept `**kwargs` | ||
in your function: | ||
|
||
```python | ||
import ibis.expr.datatypes as dt | ||
from ibis.backends.pandas.udf import udf | ||
|
||
@udf.elementwise([dt.int64], dt.double) | ||
def add_two(x, **kwargs): # do stuff with kwargs | ||
return x + 2.0 | ||
``` | ||
|
||
Or you can leave them out as we did in the example above. You can also | ||
optionally accept specific keyword arguments. | ||
|
||
For example: | ||
|
||
```python | ||
import ibis.expr.datatypes as dt | ||
from ibis.backends.pandas.udf import udf | ||
|
||
@udf.elementwise([dt.int64], dt.double) | ||
def add_two_with_none(x, y=None): | ||
if y is None: | ||
y = 2.0 | ||
return x + y | ||
``` | ||
|
||
```{python} | ||
#| echo: false | ||
BACKEND = "Pandas" | ||
``` | ||
|
||
{{< include ./_templates/api.qmd >}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.