Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support datashade points hover #1430

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

ahuang11
Copy link
Collaborator

@ahuang11 ahuang11 commented Oct 2, 2024

Adds support for hovering over datashaded points

Closes #1428

import pandas as pd
import hvplot.pandas
import datashader as ds

df = pd.DataFrame(
    {
        "lon": [-86.75, -86.75, -86.25, -86.25],
        "lat": [33.75, 34.0, 34.49, 34.5],
        "population": [100, 200, 300, 400],
        "c": ["A", "B", "A", "B"],
    }
)

p = df.hvplot.points(
    "lon",
    "lat",
    hover_cols="all",
    datashade=True,
    dynspread=True,
    aggregator=ds.count_cat("c"),
    hover_tooltips=["lat", "lon", ("Population", "@population"), ("Alpha", "@A"), ("Beta", "@B")],  # not necessary
    padding=0.2
)
p
Area.mp4

@ahuang11 ahuang11 marked this pull request as ready for review October 3, 2024 00:20
@ahuang11
Copy link
Collaborator Author

ahuang11 commented Oct 3, 2024

Tap instead of PointerXY for big datasets

Screen.Recording.2024-10-02.at.5.21.53.PM.mov

@ahuang11 ahuang11 added the type: enhancement New feature or request label Oct 3, 2024
@ahuang11 ahuang11 requested review from maximlt and hoxbro October 3, 2024 00:30
Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 46.15385% with 35 lines in your changes missing coverage. Please review.

Project coverage is 88.57%. Comparing base (b36e3a1) to head (c792ae2).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
hvplot/converter.py 36.36% 35 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1430      +/-   ##
==========================================
- Coverage   88.94%   88.57%   -0.37%     
==========================================
  Files          52       52              
  Lines        7751     7808      +57     
==========================================
+ Hits         6894     6916      +22     
- Misses        857      892      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@hoxbro hoxbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments:

  1. Does this work with other data backends like dask.dataframe?
  2. This is a (small) breaking change as the output of the DynamicMap generated by hvplot (with datashade=True, hover=True) is no longer an RGB but an Overlay. This could lead to problems in the future.
  3. We could add inspect as an option to hover to enable this feature for now. Then, we can always make it the default in the next minor release.

)

stream = PointerXY
if len(self.data) > 10000:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of these magic numbers, with no way to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes allowing changing that value is a good improvement. I wouldn't add that to the signature though, maybe just as a very explicitly named kwarg that is caught there if defined, and defaults to 10000 if not.

Comment on lines +1890 to +1894
for i in range(1, 10):
if key in df.columns:
key = f'Count_{i}'
else:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's weird about it? It's deduplicating columns; do you have a suggestion for an alternative?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10?

I can't wrap my head around what the code does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just about any scalar value other than 0 or 1 should be documented very clearly, usually by making a Parameter or at least attribute declaration with a meaningful name and associated docstring.

break
agg_value = pd.Series([len(df)], index=[key])

# take the mean of numeric columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the rest of the method generated with AI? If so, please clearly state so.

Otherwise, please explain why you take the mean. In general, comments should not explain what is done but why it is done.

Copy link
Collaborator Author

@ahuang11 ahuang11 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not AI. I decided to add comments since the code was getting chunky, and easier to navigate English, especially since I use the exclude keyword for object columns instead of include keyword (had to do a double take that when I was rereading my code), and I wanted to highlight that.

Comments should not explain what is done but why it is done.

Sometimes it's easier to read English than code and guidelines don't necessarily have to be followed all the time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. It just had the structure of an AI output.

... guidelines don't necessarily have to be followed all the time.

I agree, but the use of mean need to be explained, either in the code or in the original comment.

This can return an output that does not exist in the original DataFrame when more points are in the input DataFrame. For example, the two close points in your example will return 350 for the population when they are close to each other. Is this okay? Maybe, but you need to state that you have made this decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahuang11 , @hoxbro thinks you are so intelligent that it must be artificial! :-)

@@ -842,13 +842,22 @@ def __init__(
if kind == 'errorbars':
hover = False
elif hover is None:
hover = not self.datashade
hover = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hover = True
hover = (self.datashade and self.kind == 'points') or not self.datashade

I'm not sure if hover should always be True.

agg_series_map[agg_col] = agg_value

# concat all series into a single dataframe which has one row
df_hover = pd.concat(agg_series_map.values()).to_frame().transpose()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This especially, without the comment, would be really confusing of what all the methods are doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. But why is this needed? Why is a pd.Series not good enough? We could add an extra line explaining this:

Suggested change
df_hover = pd.concat(agg_series_map.values()).to_frame().transpose()
# This is needed as the function used in `inspect_points.transform` must return a DataFrame.
df_hover = pd.concat(agg_series_map.values()).to_frame().transpose()

At some point, a person could look at this line and remove .to_frame().transpose(), and if no tests fails people could think it was not a needed conversion.

@philippjfr
Copy link
Member

This is a (small) breaking change as the output of the DynamicMap generated by hvplot (with datashade=True, hover=True) is no longer an RGB but an Overlay. This could lead to problems in the future.

Yes, I haven't reviewed in detail but if this is the approach taken we should wait for https://github.com/orgs/holoviz/projects/9 to be addressed properly rather than "hacking" it with a fake layer at the hvplot layer.

)

stream = PointerXY
if len(self.data) > 10000:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes allowing changing that value is a good improvement. I wouldn't add that to the signature though, maybe just as a very explicitly named kwarg that is caught there if defined, and defaults to 10000 if not.


# show at least the x and y columns
cols = self.hover_cols.copy()
if self.x not in cols:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is self.x always defined or can it be None at this stage?

@@ -324,6 +319,23 @@ def test_downsample_resample_when(self, kind, eltype):
assert isinstance(element, eltype)
assert len(element) == 0

@parameterized.expand([(None,), (True,), ('vline',), ('hline',)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there'd be a way to avoid calling get_plot? It's pretty expensive isn't it?

@@ -324,6 +319,23 @@ def test_downsample_resample_when(self, kind, eltype):
assert isinstance(element, eltype)
assert len(element) == 0

@parameterized.expand([(None,), (True,), ('vline',), ('hline',)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many more tests will be needed to cover all the branches of the code introduced in the converter, these first two are a good start but they're very basic and just check whether the code doesn't error and the HoloViews type is the expected one.

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Oct 8, 2024

this is the approach taken we should wait for https://github.com/orgs/holoviz/projects/9 to be addressed properly rather than "hacking" it with a fake layer at the hvplot layer.

To clarify, which tasks in https://github.com/orgs/holoviz/projects/9 should be completed first? Should we ditch this PR?

@maximlt maximlt marked this pull request as draft December 12, 2024 14:29
@maximlt
Copy link
Member

maximlt commented Dec 12, 2024

Marked this PR as Draft per the comments before. @hoxbro let us know if you can share the progress of https://github.com/orgs/holoviz/projects/9 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for hover when datashade points
5 participants