Plotting Wrappers: Occupancy Histogram #403

willGraham01 · 2025-02-04T13:04:18Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?

See #388 and related, #5 (which is actually also closed)

What does this PR do?

Adds the plot_occupancy function to the movement.plots module. This function takes in (time, space [x, y])-data and produces a histogram showing the distribution of positions across all time-points.

By default, any additional axes in the input da (DataArray) are collapsed onto the 0th-index, to provide the expected 2D data input. The selection argument can be used by the user to specify alternative coordinates along non-spacetime dimensions to collapse onto instead.

plot_occupancy returns the usual figure and axes objects, however also returns information from the plotted histogram as its third value. This is mainly because this information is difficult to re-extract from the returned axes figure. The counts information in particular would technically otherwise be lost since QuadMesh objects (that store histograms) only retain the colour-mapped values (which may blur across bins with similar, but distinct counts), and not the raw counts in each bin.

References

Closes #388. ~~Additionally, this hopefully goes some way towards addressing #5, since we are returning the histogram data as the 3rd return value.~~
Closes #5 too.

How has this PR been tested?

Addition of tests to cover expected functionality, and possible edge cases.

Is this a breaking change?

No

Does this PR require an update to the documentation?

#410

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

willGraham01 · 2025-02-11T11:27:41Z

@sfmig your comment on #5 indicates that it would be useful to have certain bits of information from the plot as outputs from this kind of function. Currently I'm just exposing the other hist2d outputs (that are suppressed by the wrapper otherwise) to the user here, not sure if you had more detailed outputs in mind when writing your comment.

But if so, we can also close #5 with this PR too.

sfmig · 2025-02-11T14:40:08Z

thanks for checking @willGraham01 !

The point of that comment was that often you not only want the figure, but also the 2D array with the bin counts. From your comment ...

Currently I'm just exposing the other hist2d outputs

seems like that is covered? So I think we can close #5 yay 😄 🚀

(Just fyi I vaguely remember this was something Sepi requested but not sure)

willGraham01 · 2025-02-11T14:47:20Z

(Just fyi I vaguely remember this was something Sepi requested but not sure)

I hope it is b/c otherwise I've just wasted 5 hours of Niko's grant 🤭 😂 But will mark #5 as closable by this 🥳

niksirbi · 2025-02-12T18:34:30Z

I will finish reviewing this tomorrow, but I can already do some cool things with this!

See source code for this figure

import numpy as np
from matplotlib import pyplot as plt

from movement import sample_data
from movement.plots import plot_occupancy

# Load the sample dataset 
ds = sample_data.fetch_dataset("DLC_two-mice.predictions.csv")

# Compute the centroid of all keypoints
centroid_position = ds.position.mean("keypoints")

image = plt.imread(ds.attrs["frame_path"])

# Construct bins of size 20x20 pixels that cover the entire image
bin_pix = 30
bins = [
    np.arange(0, image.shape[0] + bin_pix, bin_pix),
    np.arange(0, image.shape[1] + bin_pix, bin_pix),
]

# Initialize the figure and axis
fig, ax = plt.subplots()

# Show the image
ax.imshow(image)

# Plot the occupancy 2D histogram for each individual
_, _, hist_data = plot_occupancy(
    da=centroid_position,
    selection={"individuals": "individual1"},
    ax=ax,
    cmap="viridis",
    alpha=0.5,
    bins=bins,
    cmin=3,      # Set the minimum shown count
    norm="log"
)

# Set the axis limits to match the image
ax.set_xlim(0, image.shape[1])
ax.set_ylim(image.shape[0], 0)

niksirbi

Thanks @willGraham01!

I’ve added some comments, mostly about aligning the function signature (and default behavior) with that of plot_trajectory().

Regarding your discussion with Sofía:
Yes, this approach technically meets the requirement of also obtaining the occupancy data as a 2D array, which is excellent. However, it can be slightly awkward to always rely on the plotting function when all you need is the occupancy array. There may be scenarios where the user only wants the 2D occupancy array—without the plot—for comparisons with neural data. From that perspective, it might be more intuitive to have a dedicated compute_occupancy function that returns both the 2D array and the bin edges. We could discuss the best data structure to return—whether that’s an xr.DataArray or multiple NumPy arrays, similar to hist2d.

In any case, I suggest merging this PR with just plot_occupancy (after addressing my comments) and leaving compute_occupancy for a future PR. We just need to ensure that both functions produce consistent histogram data, i.e. compute_occupancy should use the same underlying method as hist2d.

movement/plots/__init__.py

niksirbi · 2025-02-13T09:38:08Z

movement/plots/occupancy.py

+    selection : dict[str, Hashable], optional
+        Mapping of dimension identifiers to the coordinate along that dimension
+        to plot. "time" and "space" dimensions are ignored. For example,
+        ``selection = {"individuals": "Bravo"}`` will create the occupancy
+        histogram for the individual "Bravo", instead of the occupancy
+        histogram for the 0-indexed entry on the ``"individuals"`` dimension.


I propose that we use individual, keypoints as arguments instead of selection, to keep this aligned with plot_trajectory.

Handling Keypoints

I also suggest adjusting the default behaviour for keypoints to match that in plot_trajectory. Specifically:

If no keypoint is explicitly specified (keypoints=None, the default), plot occupancy for the centroid of all available keypoints. If there is only one keypoint in the data array (either no keypoints dimension, or a keypoints dimension of size 1), then plot that single keypoint.

If multiple keypoints are specified by label (e.g. keypoints=['left_ear', 'right_ear']), plot occupancy for the centroid of those selected keypoints. I expect this to be a common use case—for instance, when users want to plot occupancy of the head, they might only include the relevant head keypoints.

If a single keypoint is specified by label, plot occupancy for that keypoint alone.

Handling Individuals

I am less certain about how best to handle individual. There are at least two potentially sensible options:

Option A

We could adopt the same behaviour as plot_trajectory:

If no individual is explicitly specified (individual=None, default), plot the first individual.

If one individual is specified by label, plot that individual.

Disallow specifying more than one individual.

Option B

Alternatively, consider users who want occupancy plots for multi-individual datasets. They may expect a sum of all individual-level counts, representing occupancy for the entire group. This might be particularly relevant for large groups (e.g. flocking behaviour). Under this scenario, we would indeed use individuals, keypoints (both plural) as follows:

If no individual is explicitly specified (individuals=None, default), plot the occupancy of the entire group, summing counts from all individuals. The bin extents would likely need to encompass the group's overall range.

If multiple individuals are specified (e.g. individuals=['Alpha', 'Bravo']), again sum counts only for those selected individuals.

If a single individual is specified, plot occupancy only for that individual.

I would be interested in hearing the intuition of others, including @stellaprins and @willGraham01, regarding Option A vs Option B. If we are undecided, I suggest we start with Option A for consistency and revisit other behaviours later once we have user feedback.

I would go for option B and allow multiple individuals for occupancy with as default the summing counts for all individuals. If there is multi-individual data it seems likely to me that it is desirable to quickly have an overview that includes occupancy of all animals (or groups of them). If it turns out the default is generally uninformative, the default can always be changed to the first individual while still giving the option to allow multiple individuals to be specified.

niksirbi · 2025-02-13T09:44:33Z

movement/plots/occupancy.py

+    kwargs : Any
+        Keyword arguments passed to ``matplotlib.pyplot.hist2d``


I'm completely on board with forwarding all kwargs to hist2d. However, I think it would be helpful to illustrate some of the most commonly used kwargs in one or two examples in this docstring. While experimenting with this function, I found the following particularly useful:

bins (since users will want full control over the bin sizes)

cmin (especially useful when overlaying the trajectory on an image, to mask areas with low occupancy counts)

norm (particularly norm="log")

I don't believe we need to show all of these in the docstring example, as we have more space to explore them in a proper Sphinx Gallery example (see issue #410). However, we should at least demonstrate a typical usage of bins, for example bins=(30, 30).

movement/plots/occupancy.py

willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from 3fe51e8 to 2cc312a Compare February 4, 2025 13:05

This comment was marked as resolved.

Sign in to view

willGraham01 changed the title ~~Plot wrapper for occupancy histogram~~ Plotting Wrappers: Occupancy Histogram Feb 4, 2025

willGraham01 linked an issue Feb 4, 2025 that may be closed by this pull request

Plotting wrappers: Occupancy Heatmap #388

Open

willGraham01 mentioned this pull request Feb 5, 2025

Collapse dimensions common functionality for plot wrappers #405

Open

7 tasks

willGraham01 added 12 commits February 11, 2025 11:13

Basic histogram plot created

85f6d9c

Allow kwargs to go to underlying function

21ff9bb

Remove manual debugging from package module

fb03d41

Write test, but it fails. But can't figure out why it fails...

5798379

Additional return values to help extract histogram information

460a87b

Test missing dims and entirely NAN values

2330909

Check that new / existing axes are respected

22a22bf

Default units to pixels

82d48e3

SonarQube recommendations

1a84728

Comply with new plot wrapper standards

56e4c6c

Add test for default selection case

d44d19d

Add check for incorrect dims after squeezing

e412849

willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from 415f620 to e412849 Compare February 11, 2025 11:13

This comment was marked as resolved.

Sign in to view

willGraham01 marked this pull request as ready for review February 11, 2025 11:14

willGraham01 mentioned this pull request Feb 11, 2025

Example for "quick plot" functions #410

Open

willGraham01 requested a review from niksirbi February 11, 2025 11:24

niksirbi requested changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting Wrappers: Occupancy Histogram #403

Plotting Wrappers: Occupancy Histogram #403

willGraham01 commented Feb 4, 2025 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

willGraham01 commented Feb 11, 2025

sfmig commented Feb 11, 2025 •

edited

Loading

willGraham01 commented Feb 11, 2025 •

edited

Loading

niksirbi commented Feb 12, 2025

niksirbi left a comment

niksirbi Feb 13, 2025 •

edited

Loading

stellaprins Feb 13, 2025

niksirbi Feb 13, 2025

		kwargs : Any
		Keyword arguments passed to ``matplotlib.pyplot.hist2d``

Plotting Wrappers: Occupancy Histogram #403

Are you sure you want to change the base?

Plotting Wrappers: Occupancy Histogram #403

Conversation

willGraham01 commented Feb 4, 2025 • edited Loading

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

This comment was marked as resolved.

This comment was marked as resolved.

willGraham01 commented Feb 11, 2025

sfmig commented Feb 11, 2025 • edited Loading

willGraham01 commented Feb 11, 2025 • edited Loading

niksirbi commented Feb 12, 2025

niksirbi left a comment

Choose a reason for hiding this comment

niksirbi Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Handling Keypoints

Handling Individuals

Option A

Option B

stellaprins Feb 13, 2025

Choose a reason for hiding this comment

niksirbi Feb 13, 2025

Choose a reason for hiding this comment

willGraham01 commented Feb 4, 2025 •

edited

Loading

sfmig commented Feb 11, 2025 •

edited

Loading

willGraham01 commented Feb 11, 2025 •

edited

Loading

niksirbi Feb 13, 2025 •

edited

Loading