diff --git a/docs/requirements.txt b/docs/requirements.txt
index dccf92dd62d1..d64ab525bedd 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -2,6 +2,7 @@ pandas
 pyarrow
 graphviz
 matplotlib
+numba
 seaborn
 plotly
 altair
diff --git a/docs/src/python/user-guide/expressions/numba-example.py b/docs/src/python/user-guide/expressions/numba-example.py
new file mode 100644
index 000000000000..acd6c10c2b3b
--- /dev/null
+++ b/docs/src/python/user-guide/expressions/numba-example.py
@@ -0,0 +1,17 @@
+import polars as pl
+import numba as nb
+
+df = pl.DataFrame({"a": [10, 9, 8, 7]})
+
+
+@nb.guvectorize([(nb.int64[:], nb.int64, nb.int64[:])], "(n),()->(n)")
+def cum_sum_reset(x, y, res):
+    res[0] = x[0]
+    for i in range(1, x.shape[0]):
+        res[i] = x[i] + res[i - 1]
+        if res[i] >= y:
+            res[i] = x[i]
+
+
+out = df.select(cum_sum_reset(pl.all(), 5))
+print(out)
diff --git a/docs/user-guide/expressions/numpy.md b/docs/user-guide/expressions/numpy.md
index 6500e87b5207..97b8a1b241e6 100644
--- a/docs/user-guide/expressions/numpy.md
+++ b/docs/user-guide/expressions/numpy.md
@@ -1,9 +1,9 @@
-# Numpy
+# Numpy ufuncs
 
 Polars expressions support NumPy [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). See [here](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)
-for a list on all supported numpy functions.
+for a list on all supported numpy functions. Additionally, SciPy offers a wide host of ufuncs. Specifically, the [scipy.special](https://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) namespace has ufunc versions of many (possibly most) of what is available under stats.
 
-This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API.
+This means that if a function is not provided by Polars, we can use NumPy and we still have fast columnar operation through the NumPy API. ufuncs have a hook that diverts their own execution when one of its inputs is a class with the [`__array_ufunc__`](https://numpy.org/doc/stable/reference/arrays.classes.html#special-attributes-and-methods) method. Polars Expr class has this method which allows ufuncs to be input directly in a context (`select`, `with_columns`, `agg`) with relevant expressions as the input. This syntax extends even to multiple input functions.
 
 ### Example
 
@@ -13,6 +13,18 @@ This means that if a function is not provided by Polars, we can use NumPy and we
 --8<-- "python/user-guide/expressions/numpy-example.py"
 ```
 
+## Numba
+
+[Numba](https://numba.pydata.org/) is an open source JIT compiler that allows you to create your own ufuncs entirely within python. The key is to use the [@guvectorize](https://numba.readthedocs.io/en/stable/user/vectorize.html#the-guvectorize-decorator) decorator. One popular use case is conditional cumulative functions. For example, suppose you want to take a cumulative sum but have it reset whenever it gets to a threshold.
+
+### Example
+
+{{code_block('user-guide/expressions/numpy-example',api_functions=['DataFrame'])}}
+
+```python exec="on" result="text" session="user-guide/numpy"
+--8<-- "python/user-guide/expressions/numba-example.py"
+```
+
 ### Interoperability
 
 Polars `Series` have support for NumPy universal functions (ufuncs). Element-wise functions such as `np.exp()`, `np.cos()`, `np.div()`, etc. all work with almost zero overhead.
@@ -20,3 +32,7 @@ Polars `Series` have support for NumPy universal functions (ufuncs). Element-wis
 However, as a Polars-specific remark: missing values are a separate bitmask and are not visible by NumPy. This can lead to a window function or a `np.convolve()` giving flawed or incomplete results.
 
 Convert a Polars `Series` to a NumPy array with the `.to_numpy()` method. Missing values will be replaced by `np.nan` during the conversion.
+
+### Note on Performance
+
+The speed of ufuncs comes from being vectorized, and compiled. That said, there's no inherent benefit in using ufuncs just to avoid the use of `map_batches`. As mentioned above, ufuncs use a hook which gives polars the opportunity to run its own code before the ufunc is executed. In that way polars is still executing the ufunc with `map_batches`.
diff --git a/docs/user-guide/expressions/user-defined-functions.md b/docs/user-guide/expressions/user-defined-functions.md
index 882cc11c6ac1..3387be994cb0 100644
--- a/docs/user-guide/expressions/user-defined-functions.md
+++ b/docs/user-guide/expressions/user-defined-functions.md
@@ -18,8 +18,7 @@ These functions have an important distinction in how they operate and consequent
 A `map_batches` passes the `Series` backed by the `expression` as is.
 
 `map_batches` follows the same rules in both the `select` and the `group_by` context, this will
-mean that the `Series` represents a column in a `DataFrame`. Note that in the `group_by` context, that column is not yet
-aggregated!
+mean that the `Series` represents a column in a `DataFrame`. To be clear, **using a `group_by` or `over` with `map_batches` will return results as though there was no group at all.**
 
 Use cases for `map_batches` are for instance passing the `Series` in an expression to a third party library. Below we show how
 we could use `map_batches` to pass an expression column to a neural network model.
diff --git a/py-polars/docs/requirements-docs.txt b/py-polars/docs/requirements-docs.txt
index f1f88d7e2940..3efb78f29f87 100644
--- a/py-polars/docs/requirements-docs.txt
+++ b/py-polars/docs/requirements-docs.txt
@@ -3,6 +3,7 @@
 numpy
 pandas
 pyarrow
+numba
 
 hypothesis==6.97.4
 
diff --git a/py-polars/requirements-dev.txt b/py-polars/requirements-dev.txt
index aeb9d3be53bd..be8c4ac15c59 100644
--- a/py-polars/requirements-dev.txt
+++ b/py-polars/requirements-dev.txt
@@ -60,6 +60,7 @@ hypothesis==6.97.4
 pytest==8.0.0
 pytest-cov==4.1.0
 pytest-xdist==3.5.0
+numba
 
 # Need moto.server to mock s3fs - see: https://github.com/aio-libs/aiobotocore/issues/755
 moto[s3]==5.0.0