Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding coastlines, state outlines to map rapidly increases file size. #780

Open
1 task done
needham-michael opened this issue Feb 11, 2025 · 7 comments
Open
1 task done

Comments

@needham-michael
Copy link

This issue report is related to a post on the HoloViz discourse forum from last week. In that post, I described that adding coastlines, country boundaries and other map features to an interactive bokeh layout rapidly increased the output file size (from 2.2 MB -> 11.0 MB in the example) and led to slow performance. Following a suggestion by @ahuang11 I used geopandas to clip the features to my expected window (North America, based on the xarray air_temperature tutorial dataset) which led to sufficient improvements for my immediate needs (reduction in file size, improved performance, etc.)

However, this still looks like a significant inefficiency which could be addressed either through improvements in the library, or through user documentation (if it turns out this is just improper use on my part).

Software Version Info

Software Version Info
bokeh==3.6.2
Cartopy==0.24.1
geoviews==1.14.0
holoviews==1.20.0
ipython==8.32.0
jupyterlab==4.3.5
panel==1.6.0
python==3.13.0
xarray==2025.1.2

Description of expected behavior and the observed behavior

Expected Behavior: Adding coastlines, state outlines, etc. only marginally impacts final file size and performance

Observed Behavior: Adding coastlines, state outlines, dramatically impacts final file size and requires careful tuning to avoid performance issues

Complete, minimal, self-contained example code that reproduces the issue

The script below generates three files to demonstrate this issue. The only difference between the first and third files is the inclusion of a couple of paths showing coastlines and US/CAN state outlines, but the resulting file is double the size. I would expect the third file to only be marginally larger than the first file.

# Size of output files after running the script
"gv_mwe_1.html" : 2.2 MB
"gv_mwe_2.html" : 355 KB
"gv_mwe_3.html" : 4.4 MB # <--- This file is much larger than expected

Note that this python script is identical to my second comment in the HoloViz discourse thread

# ============================================================================
# Import Statements
# ============================================================================

import geoviews as gv
import xarray as xr
import cartopy.crs as ccrs
from cartopy import feature as cf
import geopandas as gpd

gv.extension('bokeh')
gv.output(widget_location='bottom')

# ============================================================================
# Prep sample data
# ============================================================================

# Load xarray sample air temperature data
ds = xr.tutorial.load_dataset("air_temperature")

# Select the first 24 timesteps of 6-hourly data
ds = ds.isel(time=slice(None,24))

# wrap the xarray dataset as a geoviews Dataset and specify the native CRS
ds_gv = gv.Dataset(ds,crs=ccrs.PlateCarree())

# ============================================================================
# Create layout and map feature
# ============================================================================

proj = ccrs.PlateCarree()

# ----------------------------------------------------------------------------
# QuadMesh to display xarray / netcdf data
# ----------------------------------------------------------------------------
qmesh = ds_gv.to(gv.QuadMesh,kdims=['lon','lat'],vdims=['air','lat','lon'])

qmesh_layout = qmesh.opts(backend='bokeh',colorbar=True,projection=proj,)

# ----------------------------------------------------------------------------
# cartopy.features for coastlines, state boundaries
# ----------------------------------------------------------------------------

# Get data bounding box based on the dataset (convert longitudes
# from [0,360] to [-180,180] by subtracting 360
bbox = [float(x) for x in \
        [ds.lon.min()-360,ds.lat.min(),ds.lon.max()-360,ds.lat.max()]
       ]

# Specify the shapefile scale, one of 110m, 50m, 10m
scale = "110m"

states = gpd.clip(
    gpd.GeoDataFrame(
        geometry=gpd.GeoSeries(cf.STATES.with_scale(scale).geometries())
    ),
    bbox,
)

coastline = gpd.clip(
    gpd.GeoDataFrame(
        geometry=gpd.GeoSeries(cf.COASTLINE.with_scale(scale).geometries())
    ),
    bbox,
)

map_features = gv.Path(coastline).opts(color='black') * gv.Path(states).opts(color='black')


# ============================================================================
# Write Output Files
# ============================================================================

# ----------------------------------------------------------------------------
# File 1: Qmesh Only
# ----------------------------------------------------------------------------
f1 = "./gv_mwe_1.html"
output1 = qmesh_layout
gv.save(output1,f1)

# ----------------------------------------------------------------------------
# File 2: Map Features Only
# ----------------------------------------------------------------------------
f2 = "./gv_mwe_2.html"
output2 = map_features
gv.save(output2,f2)

# ----------------------------------------------------------------------------
# File 3: Qmesh and Map Features
# ----------------------------------------------------------------------------
f3 = "./gv_mwe_3.html"
output3 = (qmesh_layout * map_features).opts(global_extent=False)
gv.save(output3,f3)
  • I may be interested in making a pull request to address this if it only requires an improvement to the documentation
@holovizbot
Copy link

This issue has been mentioned on HoloViz Discourse. There might be relevant details there:

https://discourse.holoviz.org/t/adding-features-to-gridded-dataset-map-balloons-output-file-size-and-slows-performance/8564/4

@ahuang11
Copy link
Collaborator

Do you know what the file size of the shapefiles for those are?

If they are in the megabytes, I doubt there's a way to reduce it.

Are raster tiles an alternative? https://geoviews.org/gallery/bokeh/tile_sources.html

@needham-michael
Copy link
Author

Exporting the states and coastlines GeoDataFrames from the example (after clipping to the bbox) gives shapefiles of 37.1 KB and 19.3 KB respectively, which is why I'm surprised that the final file ends up so large.

The example netcdf data I am using is sliced down to 24 timesteps. My guess is that for some reason combining these paths with the netcdf data is causing this issue, maybe by saving a new copy of the paths to each of the 24 timesteps. Ideally I would want to just draw the paths once and use the slider to change the netcdf timestep without re-drawing the states and coastlines.

Raster tiles could work in some cases, but I prefer to draw the actual paths for a couple of reasons:

  • The ability to use custom shapefiles (e.g., for the boundaries of wildfires or specific geographical areas) which are not included in the standard map tiles
  • More flexible using different map projections (although there are some ways to reproject map tiles)

@ahuang11
Copy link
Collaborator

Thanks, can you help me do one more experiment--specifically only use 1 timestamp and see the file size?

As you mentioned it's likely duplicating the features multiple times. However, I'm not sure how bokeh works well enough, e.g. can it just reference the first time slice.

@needham-michael
Copy link
Author

Sure. Here are the output file sizes:

  • Single timestep, without coastlines/states 106.2 KB
  • Single timestep, with coastlines/states 194.8 KB

And because why not, here's the output sizes with just two timesteps.

  • Two timesteps, without coastlines/states 288.4 KB
  • Two timesteps, with coastlines/states 556.9 KB

@ahuang11
Copy link
Collaborator

Thanks for the valuable info!

Seems like two timesteps w/o features is not only 2x, but 2.5x, likely because of widget added. I presume it'll scale linearly after, like if there's three timestamps.

Will ask on Discord whether anyone has thoughts about this.

@needham-michael
Copy link
Author

Thanks! And yes, it scales linearly after adding the widget with $\geq$ 2 timesteps

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants