Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add quantile maps #30

Open
perrette opened this issue Oct 14, 2024 · 6 comments
Open

add quantile maps #30

perrette opened this issue Oct 14, 2024 · 6 comments
Labels
enhancement New feature or request rimeX

Comments

@perrette
Copy link
Collaborator

Quoting @NiklasSchwind here

We should add quantile maps to the package (maybe just the old CIE methodology is enough (=quantile taken from the year at which the median warming level in the scenario and year is first reached in each simulation) or by calculating a grid point-wise weighted quantile, not sure if this works without overloading the RAM though)

@perrette perrette added enhancement New feature or request rimeX labels Oct 14, 2024
@NiklasSchwind
Copy link
Collaborator

NiklasSchwind commented Oct 15, 2024

To go into further detail on the two methodologies proposed:

Old CIE methodology for the maps
"quantile taken from the year at which the median warming level in the scenario and year is first reached in each simulation"

This is used to calculate the maps in the current CIE. The map show the median value of the indicator per gridpoint in a scenario and year and they are produced by ...
(1) simply calculating the median warming level for the emulated scenario and year from MAGICC output
(2) pooling all maps from all input ISIMIP simulations at the time the warming level in the simulation is the same as the median warming level calculated in (1)
(3) calculate the median of the pooled maps

This could be generalized to all percentiles by calculating the general percentile of the warming level in the scenario and year from the MAGICC distribution in (1) (instead of taking the median per default) and then calculating the percentile of the pooled maps in (3) (instead of taking the median per default)

Advantages: Can be easily preprocessed for the CIE (e.g. by providing median and 95th percentile maps for all available warming levels and loading the map for the correct warming level whenever someone looks at a map)
Disadvantage: Not consistent with the rimeX emulator for time series, doesn't use all available information, not really a mathematically correct grid point quantile given our simulation democracy and GMT dependency assumptions

Methodology consistent with the rimeX emulator for time series
"calculating a grid point-wise weighted quantile"

In this method, we would calculate the quantile for each grid point the same way as for the aggregated time series (a gridpoint-wise weighted quantile weighted by the probability of their warming level in the scenario and time).

Advantages: Consistent with the rimeX emulator for time series, uses all available information, mathematically correct grid point quantile
Disadvantage: Can not be easily preprocessed for the CIE (At least I can't think of a way) and would take a long time to calculate -> Can probably not be calculated on the fly and we would need to redo preprocessing for all indicators at each scenario update (which would be painful)

@perrette
Copy link
Collaborator Author

How useful are these quantile maps? (in my mind they are not very useful...). I guess we could provide an approximation anyway. That's what I'd do for a reasonably accurate calculation:

  • The information can be stored as impact(quantile, warming, lat, lon), where quantile and warming would only be a few points (like min, 0.05, 0.1, 0.17, 0.25, 0.5, 0.75, 0.83, 0.95, max and 1, 1.5, 2, 2.5, 3, 4, say -- I don't remember what Mia used but they use few warming levels).
  • On the "driver" side, split up the warming side range into a number of quantile intervals (like 10 or 100).
  • Randomly match each of the temperature quantile with one impact quantile (and interpolate the impact according to the sampled temperature) -> this gives you 10 or 100 (temperature, impact) sample pairs, in the form of a (sample, lat, lon) matrix.

Now if we use a relatively coarse grid so that we stay under 1e6 points overall, that could be reasonably fast. Though whether that would be fast enough for a server calculation, I don't know -- can you specify some kind of requirement in terms of memory usage and execution time ?

A rougher approximation would be to compute the percentile due solely to the temperature forcing (using the median map), and another due solely to the model uncertainty (using a quantile maps like you describe above), and combine them by summing the squares of deviations from the median. So this requries computing the median map as well obviously. Three maps and the result = approx(q) = ((approx_temp(q) - median)**2 + (approx_model(q) - median)**2)**.5. That would be much faster and probably more than sufficient to display on a screen...

perrette added a commit that referenced this issue Jan 8, 2025
@perrette
Copy link
Collaborator Author

perrette commented Jan 8, 2025

I have added a new rime-pre-quantilemap script that prepare a quantile map netCDF file.
For now it uses 10 quantile "bins" by default ( quantile = 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95 ), and write all warming levels into one file. It lasts about one hour to run for one variable, and the resulting file weights about 500 Mb (using zlib compression). I have not yet tested how fast it runs, but next iterations will probably try to reduce the number of warming levels. For comparison, Mia stores individual netCDF files with warming levels 1.2, 1.5, 2, 2.5, 3 and 3.5. Is it something we want to take inspiration from @NiklasSchwind or do you prefer to stick to the bulky, high-definition (in terms of warming levels) file? (at runtime the data is interpolated between warming levels anyway). I'd tend to slim down.

The first iteration is on the server: /mnt/PROVIDE/climate_impact_explorer/isimip3/running-21-years/quantilemaps/wet_bulb_temperature_annual_abs.nc

A new iteration is coming using data relative to the projection baseline. (Mia provides several version, "abs", "rel", "diff" ... we provide only the default for each indicator)

@perrette
Copy link
Collaborator Author

perrette commented Jan 8, 2025

I just reduced the reference warming levels to [1.2, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0] (11 levels), which is over 4x smaller (presumably 100-150 Mb file, i.e. also faster to load and interpolate from).

@perrette
Copy link
Collaborator Author

perrette commented Jan 8, 2025

Preliminary tests indicate that the number of warming levels does not really matter for the computation itself. However what mattered was the very inefficent quantile method of xarray.DataArray, resulting in 20s + instead of 1s for a single year and 100 samples. By leveraging numpy.percentile instead, we have about 1 second runtime:

import numpy as np
from scipy.interpolate import RegularGridInterpolator

def fast_quantile(a, quantiles, dim=None):
    quantiles = np.asarray(quantiles)
    if np.isscalar(quantiles):
        a = a.reduce(np.percentile, quantiles*100, dim=dim)
    else:
        # "percentile" is orders of magnitude faster than "quantile"
        a_np = np.percentile(a.values, quantiles*100, axis=a.dims.index(dim))
        a = xa.DataArray(a_np,
                                    coords=[quantiles] + [a.coords[c] for c in a.dims if c != dim],
                                    dims=["quantile"] + [c for c in a.dims if c != dim])
    return a


def make_prediction(a, gmt, years, samples, seed=42, quantiles=[0.5, .05, .95]):
    interp = RegularGridInterpolator((a.warming_level.values, a.coords["quantile"].values), a.values, bounds_error=False)
    rng = np.random.default_rng(seed=seed)
    igmt = rng.integers(0, gmt.columns.size, size=samples)
    gmt_decadal_resampled = gmt.loc[years].iloc[:, igmt]
    iquantiles = rng.integers(0, a.coords["quantile"].size, size=gmt_decadal_resampled.shape)
    sampled_maps = interp((gmt_decadal_resampled.values, a.coords["quantile"].values[iquantiles]))
    sampled_maps = xa.DataArray(sampled_maps, coords=[
        gmt_decadal_resampled.index, np.arange(samples), a.lat, a.lon], dims=["year", "sample", "lat", "lon"])
    if quantiles is not None:
        sampled_maps = fast_quantile(sampled_maps, quantiles, dim="sample")
    return sampled_maps
    
sampled_maps = make_prediction(a, gmt, years=[2080], samples=100)

@perrette
Copy link
Collaborator Author

perrette commented Jan 8, 2025

I noticed some models have horizontal bands in the upper latitudes (canesm5, ipsl, uk). Probably a result of cdo remapbil interpolation from a non-regular grid (below SSP5-8.5 in 2080)

image

This also shows up in the resulting maps:

image

I guess I'll leave it for now that's something one might want to solve at a later point.

perrette added a commit that referenced this issue Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rimeX
Projects
None yet
Development

No branches or pull requests

2 participants