Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datashader speedup and bugfixes #309

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Sonja-Stockhaus
Copy link
Collaborator

This speeds everything up for me locally, but for the benchmark (#296), I see an effect (aka datashader faster than matplotlib) for e.g. 10k points/shapes etc.

@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 17, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 95.45455% with 6 lines in your changes missing coverage. Please review.

Project coverage is 84.53%. Comparing base (41e66a8) to head (febd424).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_plot/pl/utils.py 91.54% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
+ Coverage   83.76%   84.53%   +0.76%     
==========================================
  Files           8        8              
  Lines        1540     1629      +89     
==========================================
+ Hits         1290     1377      +87     
- Misses        250      252       +2     
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/basic.py 90.86% <ø> (ø)
src/spatialdata_plot/pl/render.py 96.17% <100.00%> (+1.45%) ⬆️
src/spatialdata_plot/pl/render_params.py 100.00% <100.00%> (ø)
src/spatialdata_plot/pl/utils.py 77.24% <91.54%> (+1.07%) ⬆️

trans = mtransforms.Affine2D(matrix=affine_trans)
trans_data = trans + ax.transData

rgba_image = np.transpose(rgba_image.data.compute(), (1, 2, 0)) # type: ignore[attr-defined]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and in render_shapes), we access the image as numpy array from the SpatialImage. mypy doesn't believe that compute() exists...

@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as ready for review July 17, 2024 18:57
Comment on lines 452 to 460

# compute canvas size in pixels close to the actual image size to speed up computation
plot_width = x_ext[1] - x_ext[0]
plot_height = y_ext[1] - y_ext[0]
plot_width_px = int(round(fig_params.fig.get_size_inches()[0] * fig_params.fig.dpi))
plot_height_px = int(round(fig_params.fig.get_size_inches()[1] * fig_params.fig.dpi))
factor = np.min([plot_width / plot_width_px, plot_height / plot_height_px])
plot_width = int(np.round(plot_width / factor))
plot_height = int(np.round(plot_height / factor))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider bundling this code in a private function since it's duplicate from above.

@@ -456,8 +495,25 @@ def _render_points(
cmap=render_params.cmap_params.cmap,
)

rbga_image = np.transpose(ds_result.to_numpy().base, (0, 1, 2))
cax = ax.imshow(rbga_image, zorder=render_params.zorder, alpha=render_params.alpha)
# create SpatialImage to get it back to original size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this code is a duplicate, I'd consider it refactoring it into a private function.

@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 19, 2024 that may be closed by this pull request
@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as draft July 19, 2024 15:48
@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 30, 2024 that may be closed by this pull request
3 tasks
@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 31, 2024 that may be closed by this pull request
@LucaMarconato
Copy link
Member

LucaMarconato commented Aug 1, 2024

Thanks @Sonja-Stockhaus ! I have tried the PR on the Visium HD data and the speed up is significant! I switched back to using datashader instead of matplotlib in the notebook 😊

@LucaMarconato
Copy link
Member

LucaMarconato commented Aug 9, 2024

Sonja, I found a bug with this PR when the rasterized object needs to be aligned with the rest #291. The solution seems straightforward, I think one just needs to initialize the image element from datashader using the old coordinate transformations.

@timtreis
Copy link
Member

Will look into it!

@timtreis timtreis marked this pull request as ready for review August 14, 2024 23:59
@Sonja-Stockhaus Sonja-Stockhaus changed the title speed up datashader by using canvas size equal to image size datashader speedup and bugfixes Sep 12, 2024
@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as ready for review September 12, 2024 13:20
@Sonja-Stockhaus
Copy link
Collaborator Author

@timtreis ready for review

@timtreis timtreis marked this pull request as draft September 14, 2024 16:39
@LucaMarconato LucaMarconato added the bug Something isn't working label Sep 15, 2024
@timtreis timtreis added priority: high points 🧮 Anything related to Points shapes 🫧 Anything related to Shapes labels Sep 16, 2024
@LucaMarconato
Copy link
Member

Hi @timtreis, checking the status of this PR. There are still some open tasks on this PR right? Can you list them here as tickable tasks please?

@Sonja-Stockhaus
Copy link
Collaborator Author

Sonja-Stockhaus commented Sep 27, 2024

Todo:

  • behavior of alpha when rendering shapes should be more similar to mpl
  • outline of polygons need to be implemented in datashader
  • look into aggregation methods that aren't working rn (mode, any, std, var)
  • fix failing tests

@LucaMarconato
Copy link
Member

LucaMarconato commented Sep 27, 2024

Thanks for the update!

outline of polygons need to be implemented in datashader

This task here seems tricky. I would consider raising a warning saying that the outline is currently not supported and workin on this on a separate PR (with low priority).

@Sonja-Stockhaus
Copy link
Collaborator Author

outline of polygons need to be implemented in datashader

This task here seems tricky. I would consider raising a warning saying that the outline is currently not supported and workin on this on a separate PR (with low priority).

If you use cvs.line instead of cvs.polygons you get only the outline, so you should be able to overlay the two. Depending on how fast we want to merge this PR, we can of course move that to a new one

@LucaMarconato
Copy link
Member

Ah ok, then it should be a fast addition, thanks for the update.

@timtreis timtreis assigned Sonja-Stockhaus and unassigned timtreis Sep 30, 2024
@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as ready for review October 2, 2024 07:20
@Sonja-Stockhaus
Copy link
Collaborator Author

General problem of datashader so far: when you render elements, coloring them by a value and a color map, you wouldn't see a single element if all of them have the same value X. This is due to the scaling of the data for the colormap that happens in datashader (scaled_data = (data - span[0])/(span[1] - span[0])) which leads to all elements being assigned alpha=0.

Fix in this PR: internally, not the whole colormap is passed to datashader, but just the color given by cmap(0.0). It is used to color all elements. Still, a colorbar of the cmap is drawn with the range [X, X+1].

@LucaMarconato
Copy link
Member

Thanks for the explanation, I think the approach that you implemented is a good one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working points 🧮 Anything related to Points priority: high shapes 🫧 Anything related to Shapes
Projects
None yet
4 participants