TODO

R
library(ncdf4)
library(KernSmooth)
library(quantreg)
library(devtools)
load_all()
check()

## To generate / update man files
## (rm NAMESPACE to regenerate it)
library(roxygen2)
roxygenize()

## Run from parent directory to generate docco PDF
R CMD Rd2pdf climod


# INTEGRATION TEST:

(Consider making into unit test - then it'll run automatically with check())

cd tests

R -f full-bc-integration-test.R
foreach i (*nc)
ncdump $i > foo
ncdump check/$i > bar
echo $i
diff foo bar
echo =================
rm foo bar
end
cd ..

Update: compare outputs from bc.prec.R and bc.temp.R in vignettes


##########################

VIGNETTES:

obs data - basics of netcdf interactions

obs data - climatological slicing

narccap - temp - deep dive into KDDM algorithms

cordex - tmax - normalization of the transient

cordex - prec - normalization of precip


#####
CODE:

nc_ingest:
	make culling optional
	make skipping dummy variables optional
	don't do anything with dimnames
	create a dimvars = c("lon", "lat", "time") attribute instead
	see if this allows ingestion of coord-only files (e.g., shard time)
	? need att to track original type (float, double, etc.)

plot.distmap:
	base qqplot uses approx() when arrays are different sizes
	may be generating misleading points for precip data
	consider thinning the arrays by hand
	need to thin where data is densest


#####
DOCUMENTATION:

biascorrect

stub examples:
     slice
     nc_history
     nc_ingest

[netcdf files can't go in data/ folder of package; full examples that
use netcdf data need to go in a vignette (q.v. below)]


#####
NEW FEATURES:

Once nc_ingest is linking coord vars to data via (string) dimnames:
	* overload [] with arg coordinate=TRUE to do coord range subset
	* do better netcdf output
	   	is internal nc object representation close enough to
		netcdf data model that nc_scaffold (monkeypatch) is
		unneeded? 


pdf vs pdf plot
	PDF A up
	PDF B down
	overlay A-B (= delta between identity and xfer f'n?)
	rugs

add plot, print methods for cslice
for plot.cslice - 2 lists -> matrices (pad w/NA) for mplot
# plot.cslice <- function(cs, inner.args=NULL, ...){
#     itime = slice(cs$time, cs, outer=FALSE)
#     otime = slice(cs$time, cs, outer=TRUE)
#     mplot(otime, cs$outer, pch=pcho, ...)
#     mapply(points, itime, cs$inner, inner.args)
# }


mplot has been obsoleted by as.matrix.list(), which lets you convert
lists to matrices to pass to matplot, optionally pulling a sub-element
as you go.  Probably I should just remove mplot entirely; it shows up
in the examples of normalize, denormalize, tailskill, and akde.

pfit + plot?

bplot?

xyapply?

helix plot for timeseries
	r = value of variable
	theta = day of year
	z = time
	use in vignette to explain cslicing


#####
OPEN ISSUES:

rename:  distmap -> kddm

namelist(obs,cur,fut) -> class bcdata?  class ocf?

avoid overfitting of xfer f'n:
	splinefun -> smooth.spline? [need to guarantee monotonicity]
	thin KDE inputs to splinefun?  

cslice args: ratio + outer gives error 

#####
UNIT TESTS:

# dedrizzle
denormalize, normalize
unzero

attributes: atsign, copyatts

slicing: cslice, slice

biascorrect

distmap
pdf2cdf
predict.distmap

nc_history
nc_ingest

renest

## too hard to test?
pdfskill
tailskill


untested (trivial): namelist, yearlength
untested (graphics): mplot, plot.distmap


###########################
====== FOR MULTIVAR =======
###########################

I think I can use multivar BC to fill in missing data:

First, do the normalization, kddm construction, and covariance matrix
setup using na.rm=TRUE.

Then set NA to 0 (for two-tailed) or mode(?) for one-tailed and
proceed.  NA becomes a climatology guess, basically, and then gets
adjusted based on what all the other vars are doing.

Obviously, I should test this with synthetic data.


#########################
======= FOR LATER =======
#########################

Default qqplot() function in R uses approx() when vectors are not
identical in size.  That's misleading for plot.distmap, I think.

Create nearest-order-statistic function:
	x = values, p = probabilities,
       	n  = length(x)-1
	xs=sort(x)
	return(xs[1+round(p*n)])

Create q-q plot function that uses nearest-order-statistic.
default number of points equal to smaller of nx, ny
option: multiple samples + jitter, to make a cloud?
like plot.xy -- adds points, doesn't make entire plot

Revise plot.distmap to use the new q-q plot function

Allow the option of skipping the points (new qqplot)

Default xlab = deparse(substitute(object$x)) ?

#########################

Residuals-vs-fit type plot of transfer function?  Deviation from
identity line.

#########################

Real data instead of synthetic for distmap examples?


################################################
=== DOCUMENT SOMEWHERE ZERO-HANDLING PROCESS ===
################################################

set threshold using entire dataset before doing anything else*
drop zeros when constructing distmap
in predict, conserve zeros
check for negatives between predict & denormalize
is that everything?

*rationale: we do thresholding to correct the wet/dry frequency.
 Generally, climate models exhibit excess drizzle, and depending on
 how the output was post-processed, you can have very very small or
 even negative values that will throw off the precipitation frequency
 and the distribution mapping.  Although the wet/dry frequency has
 seasonality, the drizzle problem is more about representation than it
 is about model dynamics, so the cutoff threshold should be pretty
 constant in time.  Plus, in arid regions, the number of wet days can
 be very low in certain seasons, making it difficult or impossible to
 estimate a time-varying drizzle cutoff.  Therefore, the simplest and
 most appropriate way to adjust the probability of precipitation to
 compensate for the drizzle problem in model output is: (1) floor all
 datasets at zero; (2) calculate the wet/dry fraction based on the
 entire timeseries of observational data; (3) sort the model output
 for the corresponding current period; (4) use the calculated wet-dry
 fraction to find a threshold value in the model output that will
 equalize the wet days in the model with observations (note: this only
 works if there's an excess of wet days in the model; if there's an
 excess of dry days, the threshold will be zero and a univariate bias
 correction cannot correct the wet/dry fraction); (5) set all values
 in all model runs (current and future) below the threshold to zero.