-
Notifications
You must be signed in to change notification settings - Fork 2
/
TODO
262 lines (178 loc) · 6.37 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
R
library(ncdf4)
library(KernSmooth)
library(quantreg)
library(devtools)
load_all()
check()
## To generate / update man files
## (rm NAMESPACE to regenerate it)
library(roxygen2)
roxygenize()
## Run from parent directory to generate docco PDF
R CMD Rd2pdf climod
# INTEGRATION TEST:
(Consider making into unit test - then it'll run automatically with check())
cd tests
R -f full-bc-integration-test.R
foreach i (*nc)
ncdump $i > foo
ncdump check/$i > bar
echo $i
diff foo bar
echo =================
rm foo bar
end
cd ..
Update: compare outputs from bc.prec.R and bc.temp.R in vignettes
##########################
VIGNETTES:
obs data - basics of netcdf interactions
obs data - climatological slicing
narccap - temp - deep dive into KDDM algorithms
cordex - tmax - normalization of the transient
cordex - prec - normalization of precip
#####
CODE:
nc_ingest:
make culling optional
make skipping dummy variables optional
don't do anything with dimnames
create a dimvars = c("lon", "lat", "time") attribute instead
see if this allows ingestion of coord-only files (e.g., shard time)
? need att to track original type (float, double, etc.)
plot.distmap:
base qqplot uses approx() when arrays are different sizes
may be generating misleading points for precip data
consider thinning the arrays by hand
need to thin where data is densest
#####
DOCUMENTATION:
biascorrect
stub examples:
slice
nc_history
nc_ingest
[netcdf files can't go in data/ folder of package; full examples that
use netcdf data need to go in a vignette (q.v. below)]
#####
NEW FEATURES:
Once nc_ingest is linking coord vars to data via (string) dimnames:
* overload [] with arg coordinate=TRUE to do coord range subset
* do better netcdf output
is internal nc object representation close enough to
netcdf data model that nc_scaffold (monkeypatch) is
unneeded?
pdf vs pdf plot
PDF A up
PDF B down
overlay A-B (= delta between identity and xfer f'n?)
rugs
add plot, print methods for cslice
for plot.cslice - 2 lists -> matrices (pad w/NA) for mplot
# plot.cslice <- function(cs, inner.args=NULL, ...){
# itime = slice(cs$time, cs, outer=FALSE)
# otime = slice(cs$time, cs, outer=TRUE)
# mplot(otime, cs$outer, pch=pcho, ...)
# mapply(points, itime, cs$inner, inner.args)
# }
mplot has been obsoleted by as.matrix.list(), which lets you convert
lists to matrices to pass to matplot, optionally pulling a sub-element
as you go. Probably I should just remove mplot entirely; it shows up
in the examples of normalize, denormalize, tailskill, and akde.
pfit + plot?
bplot?
xyapply?
helix plot for timeseries
r = value of variable
theta = day of year
z = time
use in vignette to explain cslicing
#####
OPEN ISSUES:
rename: distmap -> kddm
namelist(obs,cur,fut) -> class bcdata? class ocf?
avoid overfitting of xfer f'n:
splinefun -> smooth.spline? [need to guarantee monotonicity]
thin KDE inputs to splinefun?
cslice args: ratio + outer gives error
#####
UNIT TESTS:
# dedrizzle
denormalize, normalize
unzero
attributes: atsign, copyatts
slicing: cslice, slice
biascorrect
distmap
pdf2cdf
predict.distmap
nc_history
nc_ingest
renest
## too hard to test?
pdfskill
tailskill
untested (trivial): namelist, yearlength
untested (graphics): mplot, plot.distmap
###########################
====== FOR MULTIVAR =======
###########################
I think I can use multivar BC to fill in missing data:
First, do the normalization, kddm construction, and covariance matrix
setup using na.rm=TRUE.
Then set NA to 0 (for two-tailed) or mode(?) for one-tailed and
proceed. NA becomes a climatology guess, basically, and then gets
adjusted based on what all the other vars are doing.
Obviously, I should test this with synthetic data.
#########################
======= FOR LATER =======
#########################
Default qqplot() function in R uses approx() when vectors are not
identical in size. That's misleading for plot.distmap, I think.
Create nearest-order-statistic function:
x = values, p = probabilities,
n = length(x)-1
xs=sort(x)
return(xs[1+round(p*n)])
Create q-q plot function that uses nearest-order-statistic.
default number of points equal to smaller of nx, ny
option: multiple samples + jitter, to make a cloud?
like plot.xy -- adds points, doesn't make entire plot
Revise plot.distmap to use the new q-q plot function
Allow the option of skipping the points (new qqplot)
Default xlab = deparse(substitute(object$x)) ?
#########################
Residuals-vs-fit type plot of transfer function? Deviation from
identity line.
#########################
Real data instead of synthetic for distmap examples?
################################################
=== DOCUMENT SOMEWHERE ZERO-HANDLING PROCESS ===
################################################
set threshold using entire dataset before doing anything else*
drop zeros when constructing distmap
in predict, conserve zeros
check for negatives between predict & denormalize
is that everything?
*rationale: we do thresholding to correct the wet/dry frequency.
Generally, climate models exhibit excess drizzle, and depending on
how the output was post-processed, you can have very very small or
even negative values that will throw off the precipitation frequency
and the distribution mapping. Although the wet/dry frequency has
seasonality, the drizzle problem is more about representation than it
is about model dynamics, so the cutoff threshold should be pretty
constant in time. Plus, in arid regions, the number of wet days can
be very low in certain seasons, making it difficult or impossible to
estimate a time-varying drizzle cutoff. Therefore, the simplest and
most appropriate way to adjust the probability of precipitation to
compensate for the drizzle problem in model output is: (1) floor all
datasets at zero; (2) calculate the wet/dry fraction based on the
entire timeseries of observational data; (3) sort the model output
for the corresponding current period; (4) use the calculated wet-dry
fraction to find a threshold value in the model output that will
equalize the wet days in the model with observations (note: this only
works if there's an excess of wet days in the model; if there's an
excess of dry days, the threshold will be zero and a univariate bias
correction cannot correct the wet/dry fraction); (5) set all values
in all model runs (current and future) below the threshold to zero.