Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting Wrappers: Occupancy Histogram #403

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

willGraham01
Copy link
Contributor

@willGraham01 willGraham01 commented Feb 4, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

See #388 and related, #5 (which is actually also closed)

What does this PR do?

Adds the plot_occupancy function to the movement.plots module. This function takes in (time, space [x, y])-data and produces a histogram showing the distribution of positions across all time-points.

By default, any additional axes in the input da (DataArray) are collapsed onto the 0th-index, to provide the expected 2D data input. The selection argument can be used by the user to specify alternative coordinates along non-spacetime dimensions to collapse onto instead.

plot_occupancy returns the usual figure and axes objects, however also returns information from the plotted histogram as its third value. This is mainly because this information is difficult to re-extract from the returned axes figure. The counts information in particular would technically otherwise be lost since QuadMesh objects (that store histograms) only retain the colour-mapped values (which may blur across bins with similar, but distinct counts), and not the raw counts in each bin.

References

Closes #388. Additionally, this hopefully goes some way towards addressing #5, since we are returning the histogram data as the 3rd return value.
Closes #5 too.

How has this PR been tested?

Addition of tests to cover expected functionality, and possible edge cases.

Is this a breaking change?

No

Does this PR require an update to the documentation?

#410

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@willGraham01 willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from 3fe51e8 to 2cc312a Compare February 4, 2025 13:05

This comment was marked as resolved.

@willGraham01 willGraham01 changed the title Plot wrapper for occupancy histogram Plotting Wrappers: Occupancy Histogram Feb 4, 2025
@willGraham01 willGraham01 linked an issue Feb 4, 2025 that may be closed by this pull request
@willGraham01 willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from 415f620 to e412849 Compare February 11, 2025 11:13

This comment was marked as resolved.

@willGraham01 willGraham01 marked this pull request as ready for review February 11, 2025 11:14
@willGraham01
Copy link
Contributor Author

@sfmig your comment on #5 indicates that it would be useful to have certain bits of information from the plot as outputs from this kind of function. Currently I'm just exposing the other hist2d outputs (that are suppressed by the wrapper otherwise) to the user here, not sure if you had more detailed outputs in mind when writing your comment.

But if so, we can also close #5 with this PR too.

@sfmig
Copy link
Contributor

sfmig commented Feb 11, 2025

thanks for checking @willGraham01 !

The point of that comment was that often you not only want the figure, but also the 2D array with the bin counts. From your comment ...

Currently I'm just exposing the other hist2d outputs

seems like that is covered? So I think we can close #5 yay 😄 🚀

(Just fyi I vaguely remember this was something Sepi requested but not sure)

@willGraham01
Copy link
Contributor Author

willGraham01 commented Feb 11, 2025

(Just fyi I vaguely remember this was something Sepi requested but not sure)

I hope it is b/c otherwise I've just wasted 5 hours of Niko's grant 🤭 😂 But will mark #5 as closable by this 🥳

@niksirbi
Copy link
Member

I will finish reviewing this tomorrow, but I can already do some cool things with this!

image

See source code for this figure
import numpy as np
from matplotlib import pyplot as plt

from movement import sample_data
from movement.plots import plot_occupancy

# Load the sample dataset 
ds = sample_data.fetch_dataset("DLC_two-mice.predictions.csv")

# Compute the centroid of all keypoints
centroid_position = ds.position.mean("keypoints")

image = plt.imread(ds.attrs["frame_path"])

# Construct bins of size 20x20 pixels that cover the entire image
bin_pix = 30
bins = [
    np.arange(0, image.shape[0] + bin_pix, bin_pix),
    np.arange(0, image.shape[1] + bin_pix, bin_pix),
]

# Initialize the figure and axis
fig, ax = plt.subplots()

# Show the image
ax.imshow(image)

# Plot the occupancy 2D histogram for each individual
_, _, hist_data = plot_occupancy(
    da=centroid_position,
    selection={"individuals": "individual1"},
    ax=ax,
    cmap="viridis",
    alpha=0.5,
    bins=bins,
    cmin=3,      # Set the minimum shown count
    norm="log"
)

# Set the axis limits to match the image
ax.set_xlim(0, image.shape[1])
ax.set_ylim(image.shape[0], 0)

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willGraham01!

I’ve added some comments, mostly about aligning the function signature (and default behavior) with that of plot_trajectory().

Regarding your discussion with Sofía:
Yes, this approach technically meets the requirement of also obtaining the occupancy data as a 2D array, which is excellent. However, it can be slightly awkward to always rely on the plotting function when all you need is the occupancy array. There may be scenarios where the user only wants the 2D occupancy array—without the plot—for comparisons with neural data. From that perspective, it might be more intuitive to have a dedicated compute_occupancy function that returns both the 2D array and the bin edges. We could discuss the best data structure to return—whether that’s an xr.DataArray or multiple NumPy arrays, similar to hist2d.

In any case, I suggest merging this PR with just plot_occupancy (after addressing my comments) and leaving compute_occupancy for a future PR. We just need to ensure that both functions produce consistent histogram data, i.e. compute_occupancy should use the same underlying method as hist2d.

movement/plots/__init__.py Show resolved Hide resolved
Comment on lines +36 to +41
selection : dict[str, Hashable], optional
Mapping of dimension identifiers to the coordinate along that dimension
to plot. "time" and "space" dimensions are ignored. For example,
``selection = {"individuals": "Bravo"}`` will create the occupancy
histogram for the individual "Bravo", instead of the occupancy
histogram for the 0-indexed entry on the ``"individuals"`` dimension.
Copy link
Member

@niksirbi niksirbi Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we use individual, keypoints as arguments instead of selection, to keep this aligned with plot_trajectory.

Handling Keypoints

I also suggest adjusting the default behaviour for keypoints to match that in plot_trajectory. Specifically:

  • If no keypoint is explicitly specified (keypoints=None, the default), plot occupancy for the centroid of all available keypoints. If there is only one keypoint in the data array (either no keypoints dimension, or a keypoints dimension of size 1), then plot that single keypoint.
  • If multiple keypoints are specified by label (e.g. keypoints=['left_ear', 'right_ear']), plot occupancy for the centroid of those selected keypoints. I expect this to be a common use case—for instance, when users want to plot occupancy of the head, they might only include the relevant head keypoints.
  • If a single keypoint is specified by label, plot occupancy for that keypoint alone.

Handling Individuals

I am less certain about how best to handle individual. There are at least two potentially sensible options:

Option A

We could adopt the same behaviour as plot_trajectory:

  • If no individual is explicitly specified (individual=None, default), plot the first individual.
  • If one individual is specified by label, plot that individual.
  • Disallow specifying more than one individual.

Option B

Alternatively, consider users who want occupancy plots for multi-individual datasets. They may expect a sum of all individual-level counts, representing occupancy for the entire group. This might be particularly relevant for large groups (e.g. flocking behaviour). Under this scenario, we would indeed use individuals, keypoints (both plural) as follows:

  • If no individual is explicitly specified (individuals=None, default), plot the occupancy of the entire group, summing counts from all individuals. The bin extents would likely need to encompass the group's overall range.
  • If multiple individuals are specified (e.g. individuals=['Alpha', 'Bravo']), again sum counts only for those selected individuals.
  • If a single individual is specified, plot occupancy only for that individual.

I would be interested in hearing the intuition of others, including @stellaprins and @willGraham01, regarding Option A vs Option B. If we are undecided, I suggest we start with Option A for consistency and revisit other behaviours later once we have user feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go for option B and allow multiple individuals for occupancy with as default the summing counts for all individuals. If there is multi-individual data it seems likely to me that it is desirable to quickly have an overview that includes occupancy of all animals (or groups of them). If it turns out the default is generally uninformative, the default can always be changed to the first individual while still giving the option to allow multiple individuals to be specified.

Comment on lines +45 to +46
kwargs : Any
Keyword arguments passed to ``matplotlib.pyplot.hist2d``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely on board with forwarding all kwargs to hist2d. However, I think it would be helpful to illustrate some of the most commonly used kwargs in one or two examples in this docstring. While experimenting with this function, I found the following particularly useful:

  • bins (since users will want full control over the bin sizes)
  • cmin (especially useful when overlaying the trajectory on an image, to mask areas with low occupancy counts)
  • norm (particularly norm="log")

I don't believe we need to show all of these in the docstring example, as we have more space to explore them in a proper Sphinx Gallery example (see issue #410). However, we should at least demonstrate a typical usage of bins, for example bins=(30, 30).

movement/plots/occupancy.py Show resolved Hide resolved
movement/plots/occupancy.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plotting wrappers: Occupancy Heatmap Plot occupancy heatmap
4 participants