Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a RasterData from in-memory 2D array #127

Open
kilroy68 opened this issue Jul 6, 2017 · 4 comments
Open

Create a RasterData from in-memory 2D array #127

kilroy68 opened this issue Jul 6, 2017 · 4 comments

Comments

@kilroy68
Copy link

kilroy68 commented Jul 6, 2017

The RasterData class needs to be able to be instantiated with an in-memory 2D array. When doing analysis it is common to have in-memory results that you don't want to write to file before viewing. For instance, consider this code:

from osgeo import gdal
import numpy as np
driver = gdal.GetDriverByName('MEM')
src_ds = driver.Create('', 100, 200, 1)
band = src_ds.GetRasterBand(1)

ar = np.random.randint(0, 255, (200, 100))
band.WriteArray(ar)

at this point, I should be able to instantiate a RasterData from the band I just created.

@kilroy68 kilroy68 changed the title Create a RasterData from in-memory matrix Create a RasterData from in-memory 2D array Jul 6, 2017
@aashish24
Copy link
Member

thank you @kilroy68 for posting this issue. We have talked about supporting in-memory raster data but my understanding is that it is not a trivial task. I am going to ask @kotfic to provide more detail on it but it is something of high interest to us as well.

@kotfic
Copy link
Contributor

kotfic commented Jul 7, 2017

@kilroy68 The primary issue here is the process which owns the memory allocated by gdal. Jupyter's architecture includes two separate process spaces, one for python execution environment (the kernel) and one for serving web assets (tornado). Cell execution takes place in the kernel process, while tile requests are handled by the tornado process. Producing tiles with tornado from in-memory kernel results will require some form of inter-process communication (ideally using shared memory between processes).

The feature you're suggesting is a high priority for us, but we need to do more research to identify the best approach given the Jupyter architecture. If you (or others) have solved similar problems we'd love to hear about your approach!

@kilroy68
Copy link
Author

kilroy68 commented Jul 7, 2017

I don't have any experience in this space, but I would look at matplotlib's implementation. It has a similar problem in that the data i'm plotting is in the python kernel, but it needs to produce interactive web visuals through tornado.

@kotfic
Copy link
Contributor

kotfic commented Jul 7, 2017

matplotlib generates image data in the kernel process space and jupyter wraps a rendering backend to push those images to the client via a kernel <--- zeromq ---> tornado <--- websocket ---> client bridge. This is why you need to do %matplotlib inline (to wrap the renderer and let jupyter know what to do with image based return types). Creating a custom bridge using zeromq from the kernel to the tornado server for tile serving is a possibility but there are two critical problems with this approach, either:

  1. the data needs to be copied into the tornado processes' addressable memory preventing large amounts of data from being rendered effectively (one thing we've considered here is setting up an in-memory file system - which would resolve some other potentially show-stopping issues with mapnik).
  2. the tiles need to be generated in the kernel, which will either prevent cell execution while tiles are being rendered, or require running a separate threaded tile server inside the kernel (ugly, but maybe possible?). Even if that was possible/feasible we would still need to use mapnik for down sampling data to render tiles at different resolutions. While mapnik has a way of reading data from in memory it assumes you've allocated an empty memory container and are pushing data into that container. I don't believe it has mechanisms for wrapping already in-memory data for doing down sampling and styling operations (but maybe this is something we could extend?).

This is basically as far as we've gotten. There are still avenues to explore, and I'm hopeful we'll be able to come up with a solid solution so we can deliver this feature (you're not the only one who has asked!). But as I hope I've illustrated, its a non-trivial effort to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants