[WISE FR]: More stat's in multi-band TIF files #131

RobBryce · 2022-09-19T21:41:16Z

Contact Details

What is needed?

Currently, multi-band TIF files contain:

accumulated band
count band
mean band (accumulated / count)
minimum value band
maximum value band

The request here is to add bands for standard deviation and standard error, to improve post-processing analysis.
There will be no cost to the WISE team for this work, it'll be a contribution.

How will this improve the project or tool?

More complete statistics available to post-analysis tools.

TODO

Tested in operations theater (FireCast 2022)
Exec approval for #131 #152
Merge Back into Build
Provide Attribution

BadgerOnABike · 2022-09-28T19:21:03Z

I'll continue to rail that we should be including the median. As many of these variables, the mean is relatively meaningless as the data is not normal.

RobBryce · 2022-09-29T01:05:01Z

@BadgerOnABike Summary of discussion privately:
The concern of providing median from within the WISE executable is the memory required to calculate it. Specifically, if there are 100 sub-scenarios to combine, then we need to store 100 layers of sub-scenarios, for each output we want a median for, to be able to compute it - lots of memory 10's of GB's or RAM.

In addition to this, we have tools to combine these multi-layer grids later on, so that scenarios (not just sub-scenarios) can be combined in meaningful manners. At this point, these multi-band TIF's do not store any individual sub-scenario outputs, just the results of the combinations. To combine many multi-layer grids and then compute a real median (not median of medians), then all sub-scenario outputs need stored for later re-analysis. That will produce very large multi-band TIF files to recombine. However, it would provide all data to look at each sub-scenario individually (which sort of defeats the purpose of a sub-scenario).

BadgerOnABike · 2022-09-29T15:46:03Z

Perhaps we need to pour over the code of Burn-P3 because it allows us to acquire any percentile we desire and doesn't consume all RAM until many hundreds of thousands of runs are being completed. Additionally, if we aren't providing a full suite of statistics with the sub scenarios, their utility arguably goes down as we cannot determine anything about the distribution from the mean considering the normality assumption of that statistic isn't met. That is of course assuming that BP3 is calculating it correctly and isn't acquiring it from some other way?

RobBryce · 2022-09-29T16:19:25Z

From memory (since I have audited that code in the past, and the code may have been updated):

BurnP3 uses a variable-sized (auto-sized) array per grid cell, so it loses information of which fire provided which value (which isn't important for median calculations). This potentially reduces overall memory usage but at the expense over more overhead per cell, and slower insertions of data, and a whole lot of memory fragmentation. It is a viable approach for in-memory, though, particularly if fires are relatively small w/r to the overall dimensions of your plot. And I recall reporting an issue where the median for RAZ was not being calculated correctly (treated as linear data rather than circular, but I don't recall if you care about median RAZ).

WISE doesn't limit the stat's to the closed set that BurnP3 does. And the BurnP3 approach would need a non-standard file format to export this data to calculate medians of combined datasets.

You don't need to retain the complete dataset to calculate any of mean, standard deviation, or standard error.

RobBryce · 2022-11-30T22:01:54Z

I'm ready to merge this (standard deviation and standard error) back in, ready for evaluation. Alberta Parks contribution.

spydmobile · 2023-02-11T14:56:45Z

@RobBryce what is the status of this work? Is it complete?

RobBryce · 2023-02-13T16:20:51Z

Standard deviation and standard error stat's were added and received a lot of testing. Outside validation may not hurt once others are generating sub-scenarios.
No work on median values has been performed since that wasn't part of the original ticket text, and budget for this work was limited to std dev and std err.

BadgerOnABike · 2023-02-13T17:49:04Z

Am I correct in thinking this is when fires are burned iteratively and we are calculating the mean by pixel across a range of weather parameters or is the mean / sd / se coming form another place? I'm unclear as to how I would perform testing of these metrics, though I am interested in doing so.

RobBryce · 2023-02-13T18:02:55Z

The output of a scenario TIF file (with sub-scenarios) is a multi-band TIF. We only added a few more bands for sd/se. Existing bands were listed above.
An export from a regular scenario is single-band. An export from a scenario with even one sub-scenario is multi-band.
Sub-scenarios may have different weather, or other different parameters too. But for our work, it is typically iterating through weather.

BadgerOnABike · 2023-02-13T18:41:31Z

I get that part, I'm curious what is being averaged here. Multiple scenarios / subscenarios is how I'm understanding it, is that correct?

RobBryce · 2023-02-13T18:42:01Z

Yes, for whatever stat is requested

BadgerOnABike · 2023-02-13T18:45:12Z

Alright, then I'd be able to fairly easily replicate. I presume then for making means you're simply adding to divide by the number of scenarios at the end.

For standard deviation you would require all the layers to subtract from the mean. Wouldn't we then have the same data required for median?

RobBryce · 2023-02-13T20:13:52Z

We are using Welford's method, identified https://stackoverflow.com/questions/895929/how-do-i-determine-the-standard-deviation-stddev-of-a-set-of-values, which also has links to https://www.johndcook.com/blog/standard_deviation/. We don't need to store the complete dataset for these stat's. This way, a known change to memory consumption occurs, where-as if we are storing all data from all simulations, we cannot necessarily predict memory consumption.

BadgerOnABike · 2023-02-13T20:48:07Z

Interesting, I do see some methods to calculate a rolling median as well. I'll continue my search, until then I think what we have should work. I guess I'll find out when I go to test them again!

spydmobile · 2023-02-16T23:25:46Z

@RobBryce is the original work (not the Median) Completed? If so, Is this ready for testing? If so assign it to @BadgerOnABike and label it "Needs Testing". Otherwise this is outstanding, as this was a contribution. Also, this will need some kind of attribution which we need to resolve before we can close this.

RobBryce · 2023-02-17T00:18:15Z

The original work has been used for a while now. Once @BadgerOnABike can run projects, he can validate. Or, we can provide outputs. Either way we had to validate it some months ago. @lizzydchappy can provide specifics, but I believe attribution should go to Alberta Parks.

BadgerOnABike · 2023-03-30T15:57:37Z

Answer to the question of "Does the median matter in HFI"

TLDR: Yes

I summarised data for multiple decades of Alberta fire weather history in 3 ways. Everything, everything in June, everything in June at station C3, they all show the same general trend. Massively 0 inflated data yielding a negatively exponential distribution. This will matter most when performing models with stochastic weather information or running the system in a mode to determine any kind of probabilistic output.

All data:

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.00	86.28	1920.72	4832.14	7003.20	206482.70

June:

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.00	42.06	949.31	4273.83	6263.56	132813.10

June at C3:

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.00	8.59	320.28	2273.88	2532.01	47268.11

RobBryce · 2023-03-30T16:02:32Z

That's great work. Now we know. :)

RobBryce added Triage Issue needs triage feature This is a feature request labels Sep 19, 2022

RobBryce assigned spydmobile Sep 19, 2022

RobBryce added Needs Approval Needs approval to proceed with work, after review of plan/estimate/quotation Outside Contribution labels Nov 30, 2022

spydmobile added Attribution Required and removed Triage Issue needs triage labels Nov 30, 2022

spydmobile assigned BadgerOnABike and nealmcloughlin Nov 30, 2022

spydmobile added Approved Approved for Action and removed Needs Approval Needs approval to proceed with work, after review of plan/estimate/quotation labels Dec 9, 2022

spydmobile assigned RobBryce and unassigned spydmobile, BadgerOnABike and nealmcloughlin Jan 11, 2023

RobBryce assigned spydmobile and unassigned RobBryce Feb 13, 2023

spydmobile assigned RobBryce and unassigned spydmobile Feb 16, 2023

RobBryce assigned BadgerOnABike and unassigned RobBryce Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WISE FR]: More stat's in multi-band TIF files #131

[WISE FR]: More stat's in multi-band TIF files #131

RobBryce commented Sep 19, 2022 •

edited by spydmobile

Loading

BadgerOnABike commented Sep 28, 2022

RobBryce commented Sep 29, 2022

BadgerOnABike commented Sep 29, 2022

RobBryce commented Sep 29, 2022

RobBryce commented Nov 30, 2022 •

edited

Loading

spydmobile commented Feb 11, 2023

RobBryce commented Feb 13, 2023

BadgerOnABike commented Feb 13, 2023

RobBryce commented Feb 13, 2023

BadgerOnABike commented Feb 13, 2023

RobBryce commented Feb 13, 2023

BadgerOnABike commented Feb 13, 2023

RobBryce commented Feb 13, 2023 •

edited

Loading

BadgerOnABike commented Feb 13, 2023

spydmobile commented Feb 16, 2023

RobBryce commented Feb 17, 2023

BadgerOnABike commented Mar 30, 2023

RobBryce commented Mar 30, 2023

[WISE FR]: More stat's in multi-band TIF files #131

[WISE FR]: More stat's in multi-band TIF files #131

Comments

RobBryce commented Sep 19, 2022 • edited by spydmobile Loading

Contact Details

What is needed?

How will this improve the project or tool?

TODO

BadgerOnABike commented Sep 28, 2022

RobBryce commented Sep 29, 2022

BadgerOnABike commented Sep 29, 2022

RobBryce commented Sep 29, 2022

RobBryce commented Nov 30, 2022 • edited Loading

spydmobile commented Feb 11, 2023

RobBryce commented Feb 13, 2023

BadgerOnABike commented Feb 13, 2023

RobBryce commented Feb 13, 2023

BadgerOnABike commented Feb 13, 2023

RobBryce commented Feb 13, 2023

BadgerOnABike commented Feb 13, 2023

RobBryce commented Feb 13, 2023 • edited Loading

BadgerOnABike commented Feb 13, 2023

spydmobile commented Feb 16, 2023

RobBryce commented Feb 17, 2023

BadgerOnABike commented Mar 30, 2023

RobBryce commented Mar 30, 2023

RobBryce commented Sep 19, 2022 •

edited by spydmobile

Loading

RobBryce commented Nov 30, 2022 •

edited

Loading

RobBryce commented Feb 13, 2023 •

edited

Loading