-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plotting Wrappers: Occupancy Histogram #403
base: main
Are you sure you want to change the base?
Conversation
3fe51e8
to
2cc312a
Compare
This comment was marked as resolved.
This comment was marked as resolved.
415f620
to
e412849
Compare
This comment was marked as resolved.
This comment was marked as resolved.
@sfmig your comment on #5 indicates that it would be useful to have certain bits of information from the plot as outputs from this kind of function. Currently I'm just exposing the other But if so, we can also close #5 with this PR too. |
thanks for checking @willGraham01 ! The point of that comment was that often you not only want the figure, but also the 2D array with the bin counts. From your comment ...
seems like that is covered? So I think we can close #5 yay 😄 🚀 (Just fyi I vaguely remember this was something Sepi requested but not sure) |
I hope it is b/c otherwise I've just wasted 5 hours of Niko's grant 🤭 😂 But will mark #5 as closable by this 🥳 |
I will finish reviewing this tomorrow, but I can already do some cool things with this! See source code for this figureimport numpy as np
from matplotlib import pyplot as plt
from movement import sample_data
from movement.plots import plot_occupancy
# Load the sample dataset
ds = sample_data.fetch_dataset("DLC_two-mice.predictions.csv")
# Compute the centroid of all keypoints
centroid_position = ds.position.mean("keypoints")
image = plt.imread(ds.attrs["frame_path"])
# Construct bins of size 20x20 pixels that cover the entire image
bin_pix = 30
bins = [
np.arange(0, image.shape[0] + bin_pix, bin_pix),
np.arange(0, image.shape[1] + bin_pix, bin_pix),
]
# Initialize the figure and axis
fig, ax = plt.subplots()
# Show the image
ax.imshow(image)
# Plot the occupancy 2D histogram for each individual
_, _, hist_data = plot_occupancy(
da=centroid_position,
selection={"individuals": "individual1"},
ax=ax,
cmap="viridis",
alpha=0.5,
bins=bins,
cmin=3, # Set the minimum shown count
norm="log"
)
# Set the axis limits to match the image
ax.set_xlim(0, image.shape[1])
ax.set_ylim(image.shape[0], 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @willGraham01!
I’ve added some comments, mostly about aligning the function signature (and default behavior) with that of plot_trajectory()
.
Regarding your discussion with Sofía:
Yes, this approach technically meets the requirement of also obtaining the occupancy data as a 2D array, which is excellent. However, it can be slightly awkward to always rely on the plotting function when all you need is the occupancy array. There may be scenarios where the user only wants the 2D occupancy array—without the plot—for comparisons with neural data. From that perspective, it might be more intuitive to have a dedicated compute_occupancy
function that returns both the 2D array and the bin edges. We could discuss the best data structure to return—whether that’s an xr.DataArray
or multiple NumPy arrays, similar to hist2d
.
In any case, I suggest merging this PR with just plot_occupancy
(after addressing my comments) and leaving compute_occupancy
for a future PR. We just need to ensure that both functions produce consistent histogram data, i.e. compute_occupancy
should use the same underlying method as hist2d
.
selection : dict[str, Hashable], optional | ||
Mapping of dimension identifiers to the coordinate along that dimension | ||
to plot. "time" and "space" dimensions are ignored. For example, | ||
``selection = {"individuals": "Bravo"}`` will create the occupancy | ||
histogram for the individual "Bravo", instead of the occupancy | ||
histogram for the 0-indexed entry on the ``"individuals"`` dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose that we use individual, keypoints
as arguments instead of selection
, to keep this aligned with plot_trajectory
.
Handling Keypoints
I also suggest adjusting the default behaviour for keypoints to match that in plot_trajectory
. Specifically:
- If no keypoint is explicitly specified (
keypoints=None
, the default), plot occupancy for the centroid of all available keypoints. If there is only one keypoint in the data array (either nokeypoints
dimension, or akeypoints
dimension of size 1), then plot that single keypoint. - If multiple keypoints are specified by label (e.g.
keypoints=['left_ear', 'right_ear']
), plot occupancy for the centroid of those selected keypoints. I expect this to be a common use case—for instance, when users want to plot occupancy of the head, they might only include the relevant head keypoints. - If a single keypoint is specified by label, plot occupancy for that keypoint alone.
Handling Individuals
I am less certain about how best to handle individual
. There are at least two potentially sensible options:
Option A
We could adopt the same behaviour as plot_trajectory
:
- If no individual is explicitly specified (
individual=None
, default), plot the first individual. - If one individual is specified by label, plot that individual.
- Disallow specifying more than one individual.
Option B
Alternatively, consider users who want occupancy plots for multi-individual datasets. They may expect a sum of all individual-level counts, representing occupancy for the entire group. This might be particularly relevant for large groups (e.g. flocking behaviour). Under this scenario, we would indeed use individuals, keypoints
(both plural) as follows:
- If no individual is explicitly specified (
individuals=None
, default), plot the occupancy of the entire group, summing counts from all individuals. The bin extents would likely need to encompass the group's overall range. - If multiple individuals are specified (e.g.
individuals=['Alpha', 'Bravo']
), again sum counts only for those selected individuals. - If a single individual is specified, plot occupancy only for that individual.
I would be interested in hearing the intuition of others, including @stellaprins and @willGraham01, regarding Option A vs Option B. If we are undecided, I suggest we start with Option A for consistency and revisit other behaviours later once we have user feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go for option B and allow multiple individuals for occupancy with as default the summing counts for all individuals. If there is multi-individual data it seems likely to me that it is desirable to quickly have an overview that includes occupancy of all animals (or groups of them). If it turns out the default is generally uninformative, the default can always be changed to the first individual while still giving the option to allow multiple individuals to be specified.
kwargs : Any | ||
Keyword arguments passed to ``matplotlib.pyplot.hist2d`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm completely on board with forwarding all kwargs
to hist2d
. However, I think it would be helpful to illustrate some of the most commonly used kwargs
in one or two examples in this docstring. While experimenting with this function, I found the following particularly useful:
bins
(since users will want full control over the bin sizes)cmin
(especially useful when overlaying the trajectory on an image, to mask areas with low occupancy counts)norm
(particularlynorm="log"
)
I don't believe we need to show all of these in the docstring example, as we have more space to explore them in a proper Sphinx Gallery example (see issue #410). However, we should at least demonstrate a typical usage of bins
, for example bins=(30, 30)
.
Description
What is this PR
Why is this PR needed?
See #388 and related, #5 (which is actually also closed)
What does this PR do?
Adds the
plot_occupancy
function to themovement.plots
module. This function takes in(time, space [x, y])
-data and produces a histogram showing the distribution of positions across all time-points.By default, any additional axes in the input
da
(DataArray
) are collapsed onto the 0th-index, to provide the expected 2D data input. Theselection
argument can be used by the user to specify alternative coordinates along non-spacetime dimensions to collapse onto instead.plot_occupancy
returns the usual figure and axes objects, however also returns information from the plotted histogram as its third value. This is mainly because this information is difficult to re-extract from the returnedaxes
figure. Thecounts
information in particular would technically otherwise be lost sinceQuadMesh
objects (that store histograms) only retain the colour-mapped values (which may blur across bins with similar, but distinct counts), and not the raw counts in each bin.References
Closes #388.
Additionally, this hopefully goes some way towards addressing #5, since we are returning the histogram data as the 3rd return value.Closes #5 too.
How has this PR been tested?
Addition of tests to cover expected functionality, and possible edge cases.
Is this a breaking change?
No
Does this PR require an update to the documentation?
#410
Checklist: