-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAPL History quantization metadata should echo NCO (aka CF!) #2926
Comments
Hi @mathomp4 You raise good questions and I would be excited to help answer what I can once the dust on CF settles (ping me then). For the time-being...
No, not yet. But you can propose a new algortithm name/definition to CF. If your BitShave simply sets NSB trailing bits to 0 (rather than IEEE-rounding them to zero like BitRound) then I agree BitShave is the most sensible name for that algorithm, and I suggest simply following the draft conventions and using "BitShave" as the algorithm name in your output. If your BitShave actually sets enough trailing bits to 0 (rather than IEEE-rounding them to zero like BitRound) to achieve NSD digits of precision then I suggest that DigitShave might be a better name for the algorithm.
Yes, exactly. Charlie |
@czender Thanks! I'll try my best to get things partly there. If nothing else, I can get the netCDF quantize types in and looking right. And if we get close now, as the PR evolves, we can make changes more easily. And talking with @tclune it looks like our bit shaving is best described as "LevelBitShave". Bit more complex than basic bit-shaving as it does look at each level separately. Maybe "LevelMeanBitShave"? Probably need a little discussion internally and once happy, work with that. (At the moment our bit-shaving is controlled in a different way than the netCDF quantize. Maybe in MAPL3 we can join the two.) Oh, and have you figured out a nice formula for |
BitRound has a maximum relative error (MRE) of 2^(-(NSB+1)). The base-10 quantization methods like Granular BitGroom do not have an MRE, AFAICT. They do have a maximum absolute error (MAE). The MAE formula is shown in Delaunay, X., A. Courtois, and F. Gouillon (2019), Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files, Geosci. Model Dev., 12(9), 4099-4113, doi:10.5194/gmd-12-4099-2019. |
Oooh. Okay. Well, I fooled around with Claude (took a bit of prodding as it kept doing things oddly) and, well, I can duplicate the MAE column in Table 3 in that paper as a function of nsd. It's a weird formula...but it does work. So good on Claude I guess. That was sort of fun. Probably not the right formula, but, interesting. |
@czender Just for fun expressed in Fortran: function calculate_mae(nsd) result(mae)
implicit none
integer, intent(in) :: nsd
real(kind=REAL32) :: mae
real(kind=REAL32) :: mae_base
integer :: correction
mae_base = 4.0 * (1.0/16.0)**floor(real(nsd)/2.0) * (1.0/8.0)**ceiling(real(nsd)/2.0)
if (nsd > 2 .and. mod(nsd, 2) == 0) then
correction = 2
else if (nsd == 7) then
correction = 2
else
correction = 1
end if
mae = mae_base * correction
end function calculate_mae Run this from
Yay! Table 3! No idea if this actually means anything in a real data set, but it was fun to duplicate it. It had such a weird nice pattern with the even-odd, but then 7 just sort of messed it up 😄 |
…uantize Fixes #2926. Update quantization properties
@czender recently released NCO 5.2.7. From the announcement:
A quick look around the CF site leads to:
as being the discussions about this. Reading time!
Since something like this will be in the CF Conventions "soon" (on the scale of CF Convention changes), MAPL should work toward it as well.
Now, at the moment, if we set, say:
in History, we get out what I think are the "what comes from netCDF" metadata:
So not quite the same. I mean, some info is there but we should issue our own metadata writes to match the CF Draft including
quantization_info
...container? (Not sure.)Some questions:
nsd
is per-variable metadata, but if a collection uses different types of quantization1, I assume one just makesquantization_info1
,quantization_info2
, etc? We'll need to read the draft conventions to figure it out.Footnotes
This might be moot as perhaps using different quantization algorithms per file might just be something we just don't want to support? It does seem odd one might use BitGroom and GranularBR in the same file for different variables. ↩
The text was updated successfully, but these errors were encountered: