MAPL History quantization metadata should echo NCO (aka CF!) #2926

mathomp4 · 2024-07-29T19:24:45Z

@czender recently released NCO 5.2.7. From the announcement:

A. Version 5.2.7 implements the final (we think) CF quantization
metadata convention. The main change is that the keywords and
container variable are now "quantization" instead of
"lossy_compression". 
ncks -7 -v ps,ts --qnt_alg=btr --qnt default=9 --qnt ps=13 --cmp='shf|zst' in.nc out.nc
ncks -m -C -v ps,ts,quantization_info out.nc
netcdf out {
...
   float ps(time,lat,lon) ;
      ps:standard_name = "surface_air_pressure" ;
      ps:units = "Pa" ;
      ps:quantization = "quantization_info" ;
      ps:quantization_nsb = 13 ;
      ps:quantization_maximum_relative_error = 6.103516e-05f ;

    char quantization_info ;
      quantization_info:algorithm = "bitround" ;
      quantization_info:implementation = "NCO version 5.2.7" ;

    float ts(time) ;
      ts:standard_name = "surface_temperature" ;
      ts:units = "K" ;
      ts:quantization = "quantization_info" ;
      ts:quantization_nsb = 9 ;
      ts:quantization_maximum_relative_error = 0.0009765625f ;
} // group /
http://nco.sf.net/nco.html#qnt_alg
http://nco.sf.net/nco.html#qnt

A quick look around the CF site leads to:

as being the discussions about this. Reading time!

Since something like this will be in the CF Conventions "soon" (on the scale of CF Convention changes), MAPL should work toward it as well.

Now, at the moment, if we set, say:

  geosgcm_prog.quantize_level: 4,
  geosgcm_prog.quantize_algorithm: 'GranularBR',

in History, we get out what I think are the "what comes from netCDF" metadata:

        float T(time, lev, lat, lon) ;
                T:_QuantizeGranularBitRoundNumberOfSignificantDigits = 4 ;
...

So not quite the same. I mean, some info is there but we should issue our own metadata writes to match the CF Draft including quantization_info...container? (Not sure.)

Some questions:

What do we do about our home-grown bit-shaving? It is sort of like BitGroom, but I think the implementation is slightly different. Does CF have a prescribed way of specifying a new algorithm?
At the moment, MAPL can only apply quantization on a per-collection basis, i.e., if like above, everything in geosgcm_prog gets GranularBR at 4. This is often not ideal (some fields need more precision than others). Eventually this should probably be at a per-field basis but something like that will probably need to wait for the yaml-based MAPL3/History3G. But, once we can do per-field quantization, the nsd is per-variable metadata, but if a collection uses different types of quantization¹, I assume one just makes quantization_info1, quantization_info2, etc? We'll need to read the draft conventions to figure it out.

This might be moot as perhaps using different quantization algorithms per file might just be something we just don't want to support? It does seem odd one might use BitGroom and GranularBR in the same file for different variables. ↩

The text was updated successfully, but these errors were encountered:

czender · 2024-07-30T22:22:31Z

Hi @mathomp4 You raise good questions and I would be excited to help answer what I can once the dust on CF settles (ping me then). For the time-being...

Does CF have a prescribed way of specifying a new algorithm?

No, not yet. But you can propose a new algortithm name/definition to CF. If your BitShave simply sets NSB trailing bits to 0 (rather than IEEE-rounding them to zero like BitRound) then I agree BitShave is the most sensible name for that algorithm, and I suggest simply following the draft conventions and using "BitShave" as the algorithm name in your output. If your BitShave actually sets enough trailing bits to 0 (rather than IEEE-rounding them to zero like BitRound) to achieve NSD digits of precision then I suggest that DigitShave might be a better name for the algorithm.

If a collection uses different types of quantization, I assume one just makes quantization_info1, quantization_info2, etc?

Yes, exactly.

Charlie

mathomp4 · 2024-07-31T12:48:46Z

@czender Thanks! I'll try my best to get things partly there. If nothing else, I can get the netCDF quantize types in and looking right. And if we get close now, as the PR evolves, we can make changes more easily.

And talking with @tclune it looks like our bit shaving is best described as "LevelBitShave". Bit more complex than basic bit-shaving as it does look at each level separately. Maybe "LevelMeanBitShave"? Probably need a little discussion internally and once happy, work with that. (At the moment our bit-shaving is controlled in a different way than the netCDF quantize. Maybe in MAPL3 we can join the two.)

Oh, and have you figured out a nice formula for maximum_relative_error in the BR cases? I saw what you have for bitgroom (which makes nice math sense), but not sure about BR. (Maybe 10^(-nsd) 🤷🏼 )

czender · 2024-07-31T20:08:39Z

BitRound has a maximum relative error (MRE) of 2^(-(NSB+1)). The base-10 quantization methods like Granular BitGroom do not have an MRE, AFAICT. They do have a maximum absolute error (MAE). The MAE formula is shown in Delaunay, X., A. Courtois, and F. Gouillon (2019), Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files, Geosci. Model Dev., 12(9), 4099-4113, doi:10.5194/gmd-12-4099-2019.

mathomp4 · 2024-08-01T00:07:49Z

Oooh. Okay. Well, I fooled around with Claude (took a bit of prodding as it kept doing things oddly) and, well, I can duplicate the MAE column in Table 3 in that paper as a function of nsd. It's a weird formula...but it does work. So good on Claude I guess.

That was sort of fun. Probably not the right formula, but, interesting.

mathomp4 · 2024-08-01T00:26:04Z

@czender Just for fun expressed in Fortran:

   function calculate_mae(nsd) result(mae)
      implicit none
      integer, intent(in) :: nsd
      real(kind=REAL32) :: mae
      real(kind=REAL32) :: mae_base
      integer :: correction

      mae_base = 4.0 * (1.0/16.0)**floor(real(nsd)/2.0) * (1.0/8.0)**ceiling(real(nsd)/2.0)

      if (nsd > 2 .and. mod(nsd, 2) == 0) then
         correction = 2
      else if (nsd == 7) then
         correction = 2
      else
         correction = 1
      end if

      mae = mae_base * correction
   end function calculate_mae

Run this from nsd=1 to nsd=7 and:

nsd: 1, Calculated mae (r4):    0.5000000000
nsd: 2, Calculated mae (r4):    0.0312500000
nsd: 3, Calculated mae (r4):    0.0039062500
nsd: 4, Calculated mae (r4):    0.0004882812
nsd: 5, Calculated mae (r4):    0.0000305176
nsd: 6, Calculated mae (r4):    0.0000038147
nsd: 7, Calculated mae (r4):    0.0000004768

Yay! Table 3! No idea if this actually means anything in a real data set, but it was fun to duplicate it.

It had such a weird nice pattern with the even-odd, but then 7 just sort of messed it up 😄

…uantize Fixes #2926. Update quantization properties

mathomp4 added the ⌛ Long Term Long term issues label Jul 29, 2024

mathomp4 assigned mathomp4, tclune and bena-nasa Jul 29, 2024

mathomp4 mentioned this issue Jul 31, 2024

Fixes #2926. Update quantization properties #2940

Merged

7 tasks

mathomp4 added a commit that referenced this issue Aug 7, 2024

Merge pull request #2940 from GEOS-ESM/feature/mathomp4/2926-update-q…

4d95acf

…uantize Fixes #2926. Update quantization properties

mathomp4 closed this as completed in 690649a Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAPL History quantization metadata should echo NCO (aka CF!) #2926

MAPL History quantization metadata should echo NCO (aka CF!) #2926

mathomp4 commented Jul 29, 2024

czender commented Jul 30, 2024

mathomp4 commented Jul 31, 2024

czender commented Jul 31, 2024

mathomp4 commented Aug 1, 2024 •

edited

Loading

mathomp4 commented Aug 1, 2024

MAPL History quantization metadata should echo NCO (aka CF!) #2926

MAPL History quantization metadata should echo NCO (aka CF!) #2926

Comments

mathomp4 commented Jul 29, 2024

Footnotes

czender commented Jul 30, 2024

mathomp4 commented Jul 31, 2024

czender commented Jul 31, 2024

mathomp4 commented Aug 1, 2024 • edited Loading

mathomp4 commented Aug 1, 2024

mathomp4 commented Aug 1, 2024 •

edited

Loading