-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WISE FR]: More stat's in multi-band TIF files #131
Comments
I'll continue to rail that we should be including the median. As many of these variables, the mean is relatively meaningless as the data is not normal. |
@BadgerOnABike Summary of discussion privately: In addition to this, we have tools to combine these multi-layer grids later on, so that scenarios (not just sub-scenarios) can be combined in meaningful manners. At this point, these multi-band TIF's do not store any individual sub-scenario outputs, just the results of the combinations. To combine many multi-layer grids and then compute a real median (not median of medians), then all sub-scenario outputs need stored for later re-analysis. That will produce very large multi-band TIF files to recombine. However, it would provide all data to look at each sub-scenario individually (which sort of defeats the purpose of a sub-scenario). |
Perhaps we need to pour over the code of Burn-P3 because it allows us to acquire any percentile we desire and doesn't consume all RAM until many hundreds of thousands of runs are being completed. Additionally, if we aren't providing a full suite of statistics with the sub scenarios, their utility arguably goes down as we cannot determine anything about the distribution from the mean considering the normality assumption of that statistic isn't met. That is of course assuming that BP3 is calculating it correctly and isn't acquiring it from some other way? |
From memory (since I have audited that code in the past, and the code may have been updated): BurnP3 uses a variable-sized (auto-sized) array per grid cell, so it loses information of which fire provided which value (which isn't important for median calculations). This potentially reduces overall memory usage but at the expense over more overhead per cell, and slower insertions of data, and a whole lot of memory fragmentation. It is a viable approach for in-memory, though, particularly if fires are relatively small w/r to the overall dimensions of your plot. And I recall reporting an issue where the median for RAZ was not being calculated correctly (treated as linear data rather than circular, but I don't recall if you care about median RAZ). WISE doesn't limit the stat's to the closed set that BurnP3 does. And the BurnP3 approach would need a non-standard file format to export this data to calculate medians of combined datasets. You don't need to retain the complete dataset to calculate any of mean, standard deviation, or standard error. |
I'm ready to merge this (standard deviation and standard error) back in, ready for evaluation. Alberta Parks contribution. |
@RobBryce what is the status of this work? Is it complete? |
Standard deviation and standard error stat's were added and received a lot of testing. Outside validation may not hurt once others are generating sub-scenarios. |
Am I correct in thinking this is when fires are burned iteratively and we are calculating the mean by pixel across a range of weather parameters or is the mean / sd / se coming form another place? I'm unclear as to how I would perform testing of these metrics, though I am interested in doing so. |
The output of a scenario TIF file (with sub-scenarios) is a multi-band TIF. We only added a few more bands for sd/se. Existing bands were listed above. |
I get that part, I'm curious what is being averaged here. Multiple scenarios / subscenarios is how I'm understanding it, is that correct? |
Yes, for whatever stat is requested |
Alright, then I'd be able to fairly easily replicate. I presume then for making means you're simply adding to divide by the number of scenarios at the end. For standard deviation you would require all the layers to subtract from the mean. Wouldn't we then have the same data required for median? |
We are using Welford's method, identified https://stackoverflow.com/questions/895929/how-do-i-determine-the-standard-deviation-stddev-of-a-set-of-values, which also has links to https://www.johndcook.com/blog/standard_deviation/. We don't need to store the complete dataset for these stat's. This way, a known change to memory consumption occurs, where-as if we are storing all data from all simulations, we cannot necessarily predict memory consumption. |
Interesting, I do see some methods to calculate a rolling median as well. I'll continue my search, until then I think what we have should work. I guess I'll find out when I go to test them again! |
@RobBryce is the original work (not the Median) Completed? If so, Is this ready for testing? If so assign it to @BadgerOnABike and label it "Needs Testing". Otherwise this is outstanding, as this was a contribution. Also, this will need some kind of attribution which we need to resolve before we can close this. |
The original work has been used for a while now. Once @BadgerOnABike can run projects, he can validate. Or, we can provide outputs. Either way we had to validate it some months ago. @lizzydchappy can provide specifics, but I believe attribution should go to Alberta Parks. |
Answer to the question of "Does the median matter in HFI" TLDR: Yes I summarised data for multiple decades of Alberta fire weather history in 3 ways. Everything, everything in June, everything in June at station C3, they all show the same general trend. Massively 0 inflated data yielding a negatively exponential distribution. This will matter most when performing models with stochastic weather information or running the system in a mode to determine any kind of probabilistic output. All data:
June:
June at C3:
|
That's great work. Now we know. :) |
Contact Details
[email protected]
What is needed?
Currently, multi-band TIF files contain:
The request here is to add bands for standard deviation and standard error, to improve post-processing analysis.
There will be no cost to the WISE team for this work, it'll be a contribution.
How will this improve the project or tool?
More complete statistics available to post-analysis tools.
TODO
The text was updated successfully, but these errors were encountered: