From a510ad1c4c53ec31f4a790fd50be4aeda30ce1be Mon Sep 17 00:00:00 2001
From: deanm0000 <powertrading121@gmail.com>
Date: Tue, 2 Jan 2024 18:26:34 +0000
Subject: [PATCH 1/9] strong warning about map_batches

---
 docs/user-guide/expressions/user-defined-functions.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/docs/user-guide/expressions/user-defined-functions.md b/docs/user-guide/expressions/user-defined-functions.md
index 882cc11c6ac1..3387be994cb0 100644
--- a/docs/user-guide/expressions/user-defined-functions.md
+++ b/docs/user-guide/expressions/user-defined-functions.md
@@ -18,8 +18,7 @@ These functions have an important distinction in how they operate and consequent
 A `map_batches` passes the `Series` backed by the `expression` as is.
 
 `map_batches` follows the same rules in both the `select` and the `group_by` context, this will
-mean that the `Series` represents a column in a `DataFrame`. Note that in the `group_by` context, that column is not yet
-aggregated!
+mean that the `Series` represents a column in a `DataFrame`. To be clear, **using a `group_by` or `over` with `map_batches` will return results as though there was no group at all.**
 
 Use cases for `map_batches` are for instance passing the `Series` in an expression to a third party library. Below we show how
 we could use `map_batches` to pass an expression column to a neural network model.

From 904a99938c2a98e9abe2efe46899a333ee7997f5 Mon Sep 17 00:00:00 2001
From: deanm0000 <powertrading121@gmail.com>
Date: Tue, 2 Jan 2024 19:48:01 +0000
Subject: [PATCH 2/9] docs(python): add numba info/example

---
 .../user-guide/expressions/numba-example.py   | 17 ++++++++++++++++
 docs/user-guide/expressions/numpy.md          | 20 +++++++++++++++++--
 py-polars/docs/requirements-docs.txt          |  1 +
 3 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 docs/src/python/user-guide/expressions/numba-example.py

diff --git a/docs/src/python/user-guide/expressions/numba-example.py b/docs/src/python/user-guide/expressions/numba-example.py
new file mode 100644
index 000000000000..acd6c10c2b3b
--- /dev/null
+++ b/docs/src/python/user-guide/expressions/numba-example.py
@@ -0,0 +1,17 @@
+import polars as pl
+import numba as nb
+
+df = pl.DataFrame({"a": [10, 9, 8, 7]})
+
+
+@nb.guvectorize([(nb.int64[:], nb.int64, nb.int64[:])], "(n),()->(n)")
+def cum_sum_reset(x, y, res):
+    res[0] = x[0]
+    for i in range(1, x.shape[0]):
+        res[i] = x[i] + res[i - 1]
+        if res[i] >= y:
+            res[i] = x[i]
+
+
+out = df.select(cum_sum_reset(pl.all(), 5))
+print(out)
diff --git a/docs/user-guide/expressions/numpy.md b/docs/user-guide/expressions/numpy.md
index 6500e87b5207..3b0e92f9e1d0 100644
--- a/docs/user-guide/expressions/numpy.md
+++ b/docs/user-guide/expressions/numpy.md
@@ -1,7 +1,7 @@
-# Numpy
+# Numpy ufuncs
 
 Polars expressions support NumPy [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). See [here](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)
-for a list on all supported numpy functions.
+for a list on all supported numpy functions. Additionally, SciPy offers a wide host of ufuncs. Specifically, the [scipy.special](https://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) namespace has ufunc versions of many (possibly most) of what is available under stats. 
 
 This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API.
 
@@ -13,6 +13,18 @@ This means that if a function is not provided by Polars, we can use NumPy and we
 --8<-- "python/user-guide/expressions/numpy-example.py"
 ```
 
+## Numba
+
+[NumBa](https://numba.pydata.org/) is an open source JIT compiler that allows you to create your own ufuncs entirely within python. The key is to use the [@guvectorize](https://numba.readthedocs.io/en/stable/user/vectorize.html#the-guvectorize-decorator) decorator. One popular use case is conditional cumulative functions. For example, suppose you want to take a cumulative sum but have it reset whenever it gets to a threshold.
+
+### Example
+
+{{code_block('user-guide/expressions/numpy-example',api_functions=['DataFrame'])}}
+
+```python exec="on" result="text" session="user-guide/numpy"
+--8<-- "python/user-guide/expressions/numba-example.py"
+```
+
 ### Interoperability
 
 Polars `Series` have support for NumPy universal functions (ufuncs). Element-wise functions such as `np.exp()`, `np.cos()`, `np.div()`, etc. all work with almost zero overhead.
@@ -20,3 +32,7 @@ Polars `Series` have support for NumPy universal functions (ufuncs). Element-wis
 However, as a Polars-specific remark: missing values are a separate bitmask and are not visible by NumPy. This can lead to a window function or a `np.convolve()` giving flawed or incomplete results.
 
 Convert a Polars `Series` to a NumPy array with the `.to_numpy()` method. Missing values will be replaced by `np.nan` during the conversion.
+
+### Note on Performance
+
+The speed of ufuncs comes from being vectorized, compiled, and their ability to automatically use and return a pl.Series. That said, there's no inherent benefit in avoiding the use of `map_batches`. In fact, when polars sees an object that is a ufunc, it conveniently calls `map_batches`. In other words, even if you're trying to avoid calling `map_batches`, it's being called under the hood anyways.
\ No newline at end of file
diff --git a/py-polars/docs/requirements-docs.txt b/py-polars/docs/requirements-docs.txt
index dfc9cb34f0b0..a8a389802f42 100644
--- a/py-polars/docs/requirements-docs.txt
+++ b/py-polars/docs/requirements-docs.txt
@@ -3,6 +3,7 @@
 numpy
 pandas
 pyarrow
+numba
 
 hypothesis==6.92.1
 

From 1812f7fb994394317644b0b5edb1564de4cef378 Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 15:30:12 -0500
Subject: [PATCH 3/9] ufunc update

---
 docs/user-guide/expressions/numpy.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/user-guide/expressions/numpy.md b/docs/user-guide/expressions/numpy.md
index 3b0e92f9e1d0..cef23d544e69 100644
--- a/docs/user-guide/expressions/numpy.md
+++ b/docs/user-guide/expressions/numpy.md
@@ -3,7 +3,7 @@
 Polars expressions support NumPy [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). See [here](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)
 for a list on all supported numpy functions. Additionally, SciPy offers a wide host of ufuncs. Specifically, the [scipy.special](https://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) namespace has ufunc versions of many (possibly most) of what is available under stats. 
 
-This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API.
+This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API. ufuncs have a hook that diverts their own execution when one of its inputs is a class with the [__array_ufunc__](https://numpy.org/doc/stable/reference/arrays.classes.html#special-attributes-and-methods) method. Polars Expr class has this method which allows ufuncs to be input directly in a context (`select`, `with_columns`, `agg`) with relevant expressions as the input. This syntax extends even to multiple input functions. 
 
 ### Example
 
@@ -35,4 +35,4 @@ Convert a Polars `Series` to a NumPy array with the `.to_numpy()` method. Missin
 
 ### Note on Performance
 
-The speed of ufuncs comes from being vectorized, compiled, and their ability to automatically use and return a pl.Series. That said, there's no inherent benefit in avoiding the use of `map_batches`. In fact, when polars sees an object that is a ufunc, it conveniently calls `map_batches`. In other words, even if you're trying to avoid calling `map_batches`, it's being called under the hood anyways.
\ No newline at end of file
+The speed of ufuncs comes from being vectorized, and compiled. That said, there's no inherent benefit in using ufuncs just to avoid the use of `map_batches`. As mentioned above, ufuncs use a hook which gives polars the opportunity to run its own code before the ufunc is executed. In that way polars is still executing the ufunc with `map_batches`. 
\ No newline at end of file

From 8a29f2502852de75015ad50ab367032ae265dac8 Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 15:34:37 -0500
Subject: [PATCH 4/9] fmt

---
 docs/user-guide/expressions/numpy.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/user-guide/expressions/numpy.md b/docs/user-guide/expressions/numpy.md
index cef23d544e69..e0b91b1fa8c0 100644
--- a/docs/user-guide/expressions/numpy.md
+++ b/docs/user-guide/expressions/numpy.md
@@ -3,7 +3,7 @@
 Polars expressions support NumPy [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). See [here](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)
 for a list on all supported numpy functions. Additionally, SciPy offers a wide host of ufuncs. Specifically, the [scipy.special](https://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) namespace has ufunc versions of many (possibly most) of what is available under stats. 
 
-This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API. ufuncs have a hook that diverts their own execution when one of its inputs is a class with the [__array_ufunc__](https://numpy.org/doc/stable/reference/arrays.classes.html#special-attributes-and-methods) method. Polars Expr class has this method which allows ufuncs to be input directly in a context (`select`, `with_columns`, `agg`) with relevant expressions as the input. This syntax extends even to multiple input functions. 
+This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API. ufuncs have a hook that diverts their own execution when one of its inputs is a class with the [`__array_ufunc__`](https://numpy.org/doc/stable/reference/arrays.classes.html#special-attributes-and-methods) method. Polars Expr class has this method which allows ufuncs to be input directly in a context (`select`, `with_columns`, `agg`) with relevant expressions as the input. This syntax extends even to multiple input functions. 
 
 ### Example
 
@@ -35,4 +35,4 @@ Convert a Polars `Series` to a NumPy array with the `.to_numpy()` method. Missin
 
 ### Note on Performance
 
-The speed of ufuncs comes from being vectorized, and compiled. That said, there's no inherent benefit in using ufuncs just to avoid the use of `map_batches`. As mentioned above, ufuncs use a hook which gives polars the opportunity to run its own code before the ufunc is executed. In that way polars is still executing the ufunc with `map_batches`. 
\ No newline at end of file
+The speed of ufuncs comes from being vectorized, and compiled. That said, there's no inherent benefit in using ufuncs just to avoid the use of `map_batches`. As mentioned above, ufuncs use a hook which gives polars the opportunity to run its own code before the ufunc is executed. In that way polars is still executing the ufunc with `map_batches`.
\ No newline at end of file

From ce36292c676b7f9fe67e4aee5a30046fc45a43f6 Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 15:45:19 -0500
Subject: [PATCH 5/9] more formatting

---
 docs/user-guide/expressions/numpy.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/user-guide/expressions/numpy.md b/docs/user-guide/expressions/numpy.md
index e0b91b1fa8c0..2a16cab7e917 100644
--- a/docs/user-guide/expressions/numpy.md
+++ b/docs/user-guide/expressions/numpy.md
@@ -1,9 +1,9 @@
 # Numpy ufuncs
 
 Polars expressions support NumPy [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). See [here](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)
-for a list on all supported numpy functions. Additionally, SciPy offers a wide host of ufuncs. Specifically, the [scipy.special](https://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) namespace has ufunc versions of many (possibly most) of what is available under stats. 
+for a list on all supported numpy functions. Additionally, SciPy offers a wide host of ufuncs. Specifically, the [scipy.special](https://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) namespace has ufunc versions of many (possibly most) of what is available under stats.
 
-This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API. ufuncs have a hook that diverts their own execution when one of its inputs is a class with the [`__array_ufunc__`](https://numpy.org/doc/stable/reference/arrays.classes.html#special-attributes-and-methods) method. Polars Expr class has this method which allows ufuncs to be input directly in a context (`select`, `with_columns`, `agg`) with relevant expressions as the input. This syntax extends even to multiple input functions. 
+This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API. ufuncs have a hook that diverts their own execution when one of its inputs is a class with the [`__array_ufunc__`](https://numpy.org/doc/stable/reference/arrays.classes.html#special-attributes-and-methods) method. Polars Expr class has this method which allows ufuncs to be input directly in a context (`select`, `with_columns`, `agg`) with relevant expressions as the input. This syntax extends even to multiple input functions.
 
 ### Example
 
@@ -35,4 +35,4 @@ Convert a Polars `Series` to a NumPy array with the `.to_numpy()` method. Missin
 
 ### Note on Performance
 
-The speed of ufuncs comes from being vectorized, and compiled. That said, there's no inherent benefit in using ufuncs just to avoid the use of `map_batches`. As mentioned above, ufuncs use a hook which gives polars the opportunity to run its own code before the ufunc is executed. In that way polars is still executing the ufunc with `map_batches`.
\ No newline at end of file
+The speed of ufuncs comes from being vectorized, and compiled. That said, there's no inherent benefit in using ufuncs just to avoid the use of `map_batches`. As mentioned above, ufuncs use a hook which gives polars the opportunity to run its own code before the ufunc is executed. In that way polars is still executing the ufunc with `map_batches`.

From 583bd83bea6fdf718831b39fd2898cc20b4a6256 Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 16:00:33 -0500
Subject: [PATCH 6/9] requirements

---
 docs/requirements.txt                | 2 +-
 py-polars/docs/requirements-docs.txt | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/docs/requirements.txt b/docs/requirements.txt
index e0416d67440b..e24c3641198c 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -2,7 +2,7 @@ pandas
 pyarrow
 graphviz
 matplotlib
-
+numba
 mkdocs-material==9.5.2
 mkdocs-macros-plugin==1.0.4
 material-plausible-plugin==0.2.0
diff --git a/py-polars/docs/requirements-docs.txt b/py-polars/docs/requirements-docs.txt
index a8a389802f42..dfc9cb34f0b0 100644
--- a/py-polars/docs/requirements-docs.txt
+++ b/py-polars/docs/requirements-docs.txt
@@ -3,7 +3,6 @@
 numpy
 pandas
 pyarrow
-numba
 
 hypothesis==6.92.1
 

From b5ff69200d8fdf67b470426b9607bc3816b9c187 Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 16:12:59 -0500
Subject: [PATCH 7/9] both req

---
 py-polars/docs/requirements-docs.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/py-polars/docs/requirements-docs.txt b/py-polars/docs/requirements-docs.txt
index dfc9cb34f0b0..a8a389802f42 100644
--- a/py-polars/docs/requirements-docs.txt
+++ b/py-polars/docs/requirements-docs.txt
@@ -3,6 +3,7 @@
 numpy
 pandas
 pyarrow
+numba
 
 hypothesis==6.92.1
 

From 67141a986a337f6cee81b64d95d1141abb498dda Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 16:24:02 -0500
Subject: [PATCH 8/9] lowercase

---
 docs/user-guide/expressions/numpy.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/user-guide/expressions/numpy.md b/docs/user-guide/expressions/numpy.md
index 2a16cab7e917..97b8a1b241e6 100644
--- a/docs/user-guide/expressions/numpy.md
+++ b/docs/user-guide/expressions/numpy.md
@@ -15,7 +15,7 @@ This means that if a function is not provided by Polars, we can use NumPy and we
 
 ## Numba
 
-[NumBa](https://numba.pydata.org/) is an open source JIT compiler that allows you to create your own ufuncs entirely within python. The key is to use the [@guvectorize](https://numba.readthedocs.io/en/stable/user/vectorize.html#the-guvectorize-decorator) decorator. One popular use case is conditional cumulative functions. For example, suppose you want to take a cumulative sum but have it reset whenever it gets to a threshold.
+[Numba](https://numba.pydata.org/) is an open source JIT compiler that allows you to create your own ufuncs entirely within python. The key is to use the [@guvectorize](https://numba.readthedocs.io/en/stable/user/vectorize.html#the-guvectorize-decorator) decorator. One popular use case is conditional cumulative functions. For example, suppose you want to take a cumulative sum but have it reset whenever it gets to a threshold.
 
 ### Example
 

From 56e443cd5e5454a606d7c9fdd36f175047b29674 Mon Sep 17 00:00:00 2001
From: Dean MacGregor <powertrading121@gmail.com>
Date: Tue, 13 Feb 2024 16:49:21 -0500
Subject: [PATCH 9/9] more_req

---
 py-polars/requirements-dev.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/py-polars/requirements-dev.txt b/py-polars/requirements-dev.txt
index aeb9d3be53bd..be8c4ac15c59 100644
--- a/py-polars/requirements-dev.txt
+++ b/py-polars/requirements-dev.txt
@@ -60,6 +60,7 @@ hypothesis==6.97.4
 pytest==8.0.0
 pytest-cov==4.1.0
 pytest-xdist==3.5.0
+numba
 
 # Need moto.server to mock s3fs - see: https://github.com/aio-libs/aiobotocore/issues/755
 moto[s3]==5.0.0