Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-40359: Implement Butler.get for matplotlib generated png files #877

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fred3m
Copy link
Contributor

@fred3m fred3m commented Aug 11, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Patch coverage: 93.75% and no project coverage change.

Comparison is base (42f2c38) 87.72% compared to head (0078c8a) 87.72%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #877   +/-   ##
=======================================
  Coverage   87.72%   87.72%           
=======================================
  Files         274      274           
  Lines       36096    36110   +14     
  Branches     7549     7550    +1     
=======================================
+ Hits        31665    31678   +13     
- Misses       3259     3260    +1     
  Partials     1172     1172           
Files Changed Coverage Δ
python/lsst/daf/butler/formatters/matplotlib.py 95.45% <90.90%> (-4.55%) ⬇️
tests/test_matplotlibFormatter.py 95.23% <100.00%> (+0.36%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Is it impossible for the Figure->numpy conversion to work?

I need to think about how easy it would be to add in-memory datastore tests like we do in test_parquet.py. As things stand I think that will break because of the dummy converter not being real.

@@ -27,6 +27,8 @@
import unittest
from random import Random

import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a protected import since numpy is not butler requirement (although effectively it is since astropy won't work without it).


fig = butler.get(ref)
# Ensure that the result is a figure
self.assertTrue(isinstance(fig, pyplot.Figure))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertTrue(isinstance(fig, pyplot.Figure))
self.assertIsinstance(fig, pyplot.Figure)

# Ensure that the result is a figure
self.assertTrue(isinstance(fig, pyplot.Figure))
image = butler.get(ref, storageClass="NumpyArray")
self.assertTrue(isinstance(image, np.ndarray))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertTrue(isinstance(image, np.ndarray))
self.assertIsinstance(image, np.ndarray)

return fig

@staticmethod
def dummyCovnerter(cls: np.ndarray) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name has a typo in it. Also see comment about naming I make above.

I am confused by the signature. Surely this should take a plt.Figure and return an np.ndarray? Is there no way to do that conversion? It is going to be problematic in some contexts (eg in-memory datastore) if this method is not implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just what the name says, a dummy converter. This is what I was trying to get across in slack: In some cases we want to just load the png file directly, although a numpy array is fine. Yes, it is possible to convert a Figure to a numpy array, but it's messy. You basically have to create a dummy file, save the figure to the dummy file, then read the dummy file as a numpy array. 👎

I did the cursory stack overflow search, and that was the recommended solution. I also looked at the matplotlib codebase a bit and it does indeed look like it would be non-trivial to try and convert the figure to a numpy array directly. This is probably why the method wasn't implemented in the first place.

The other solution that I thought of is that we could define a PNG store class, which would be an IoBytes instance, and convert plots and arrays to and from the PNG class. In other words

@staticmethod
def fig_to_png(fig):
    buf = IoBytes()
    fig.savefig(buf)
    return buf

@staticmethod
def png_to_array(png):
    return plt.imread(png)

@staticmethod
def png_to_fig(png):
    arr = png_to_array(png)
    fig = plt.figure()
    plt.imshow(arr)
    return fig

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely untested but can something like this work?

def fig_to_np(fig: pyplot.figure.Figure) -> np.ndarray:
    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    return plt.imread(buf, format="png")

?


def _writeFile(self, inMemoryDataset: Any) -> None:
# docstring inherited from FileFormatter._writeFile
inMemoryDataset.savefig(self.fileDescriptor.location.path)

@staticmethod
def fromArray(cls: np.ndarray) -> plt.Figure:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be a static method in the formatter class. It's okay for this to be a private function in the file that is not exported and only referenced from the yaml file. The first parameter should not be named cls, it should be array or something.

@@ -38,8 +41,22 @@ class MatplotlibFormatter(FileFormatter):

def _readFile(self, path: str, pytype: type[Any] | None = None) -> Any:
# docstring inherited from FileFormatter._readFile
raise NotImplementedError(f"matplotlib figures cannot be read by the butler; path is {path}")
return plt.imread(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is returning something unexpected (and effectively backwards from expectations) please add a comment saying that it's not returning a figure.

@fred3m
Copy link
Contributor Author

fred3m commented Aug 11, 2023

The other thing that I'll add is that in general you don't want to load a saved figure as a figure, as what you're actually loading is a png file that doesn't have information about the axes, size of the figure, etc. So the dummy figure that is loaded is rarely (if ever) what the user actually desires.

@timj
Copy link
Member

timj commented Aug 11, 2023

Yes, I understand that what you get back isn't what you put in, and that is a bit of a problem in butler in general. The problem here though is that if an in-memory datastore is involved behind the scenes everything is going to break because you put a Figure and then you do a butler.get() and all it has is a Figure -- you get back what you put in but if you ask for the NumpyArray version by specifying a different storage class it won't work and will fall over because there is no functioning converter to even attempt it. In the general case you can't tell what the datastore is doing. If the converter has to use a BytesIO buffer to pretend to write a file to disk and then call imread on that buffer with format="png" then that would help a lot.

@timj
Copy link
Member

timj commented Oct 31, 2023

There is code in the wild that does:

ref = butler.registry.findDataset(plot_name, detector=detector)
print ("Plot number", plot_name)
uri = butler.getURI(ref)
display(Image(data=uri.read()))

so we should consider adding a IPythonImage StorageClass definition and adding a simply converter that does this internally so people would only need to do butler.get(ref, storageClass="IPythonImage") instead.

@TallJimbo
Copy link
Member

I like the idea of using storage class conversions to deal with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants