-
-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support datashade points hover #1430
base: main
Are you sure you want to change the base?
Conversation
Tap instead of PointerXY for big datasets Screen.Recording.2024-10-02.at.5.21.53.PM.mov |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1430 +/- ##
==========================================
- Coverage 88.94% 88.57% -0.37%
==========================================
Files 52 52
Lines 7751 7808 +57
==========================================
+ Hits 6894 6916 +22
- Misses 857 892 +35 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments:
- Does this work with other data backends like
dask.dataframe
? - This is a (small) breaking change as the output of the
DynamicMap
generated by hvplot (withdatashade=True, hover=True
) is no longer anRGB
but anOverlay
. This could lead to problems in the future. - We could add
inspect
as an option tohover
to enable this feature for now. Then, we can always make it the default in the next minor release.
) | ||
|
||
stream = PointerXY | ||
if len(self.data) > 10000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of these magic numbers, with no way to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes allowing changing that value is a good improvement. I wouldn't add that to the signature though, maybe just as a very explicitly named kwarg that is caught there if defined, and defaults to 10000 if not.
for i in range(1, 10): | ||
if key in df.columns: | ||
key = f'Count_{i}' | ||
else: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's weird about it? It's deduplicating columns; do you have a suggestion for an alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 10?
I can't wrap my head around what the code does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just about any scalar value other than 0 or 1 should be documented very clearly, usually by making a Parameter or at least attribute declaration with a meaningful name and associated docstring.
break | ||
agg_value = pd.Series([len(df)], index=[key]) | ||
|
||
# take the mean of numeric columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rest of the method generated with AI? If so, please clearly state so.
Otherwise, please explain why you take the mean. In general, comments should not explain what is done but why it is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not AI. I decided to add comments since the code was getting chunky, and easier to navigate English, especially since I use the exclude keyword for object columns instead of include keyword (had to do a double take that when I was rereading my code), and I wanted to highlight that.
Comments should not explain what is done but why it is done.
Sometimes it's easier to read English than code and guidelines don't necessarily have to be followed all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. It just had the structure of an AI output.
... guidelines don't necessarily have to be followed all the time.
I agree, but the use of mean
need to be explained, either in the code or in the original comment.
This can return an output that does not exist in the original DataFrame when more points are in the input DataFrame. For example, the two close points in your example will return 350 for the population when they are close to each other. Is this okay? Maybe, but you need to state that you have made this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -842,13 +842,22 @@ def __init__( | |||
if kind == 'errorbars': | |||
hover = False | |||
elif hover is None: | |||
hover = not self.datashade | |||
hover = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hover = True | |
hover = (self.datashade and self.kind == 'points') or not self.datashade |
I'm not sure if hover should always be True.
agg_series_map[agg_col] = agg_value | ||
|
||
# concat all series into a single dataframe which has one row | ||
df_hover = pd.concat(agg_series_map.values()).to_frame().transpose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This especially, without the comment, would be really confusing of what all the methods are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. But why is this needed? Why is a pd.Series
not good enough? We could add an extra line explaining this:
df_hover = pd.concat(agg_series_map.values()).to_frame().transpose() | |
# This is needed as the function used in `inspect_points.transform` must return a DataFrame. | |
df_hover = pd.concat(agg_series_map.values()).to_frame().transpose() |
At some point, a person could look at this line and remove .to_frame().transpose()
, and if no tests fails people could think it was not a needed conversion.
Yes, I haven't reviewed in detail but if this is the approach taken we should wait for https://github.com/orgs/holoviz/projects/9 to be addressed properly rather than "hacking" it with a fake layer at the hvplot layer. |
) | ||
|
||
stream = PointerXY | ||
if len(self.data) > 10000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes allowing changing that value is a good improvement. I wouldn't add that to the signature though, maybe just as a very explicitly named kwarg that is caught there if defined, and defaults to 10000 if not.
|
||
# show at least the x and y columns | ||
cols = self.hover_cols.copy() | ||
if self.x not in cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is self.x
always defined or can it be None
at this stage?
@@ -324,6 +319,23 @@ def test_downsample_resample_when(self, kind, eltype): | |||
assert isinstance(element, eltype) | |||
assert len(element) == 0 | |||
|
|||
@parameterized.expand([(None,), (True,), ('vline',), ('hline',)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there'd be a way to avoid calling get_plot
? It's pretty expensive isn't it?
@@ -324,6 +319,23 @@ def test_downsample_resample_when(self, kind, eltype): | |||
assert isinstance(element, eltype) | |||
assert len(element) == 0 | |||
|
|||
@parameterized.expand([(None,), (True,), ('vline',), ('hline',)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many more tests will be needed to cover all the branches of the code introduced in the converter, these first two are a good start but they're very basic and just check whether the code doesn't error and the HoloViews type is the expected one.
To clarify, which tasks in https://github.com/orgs/holoviz/projects/9 should be completed first? Should we ditch this PR? |
Marked this PR as Draft per the comments before. @hoxbro let us know if you can share the progress of https://github.com/orgs/holoviz/projects/9 . |
Adds support for hovering over datashaded points
Closes #1428
Area.mp4