-*- org -*- C-c C-o follows link[MM: grep-r -e ‘\(FIXME\|TODO\)’]
they should rather use lmrob() !! ==> ~/R/MM/Pkg-ex/robustbase/glmrob_ChrSchoetz-ex.R
and it should have correct rweights[] in {0,1} and residuals[];
(fitted[] computed in R code after .C() call);
==> /R/MM/Pkg-ex/robustbase/ThMang_lmrob.R
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and >>> tests/subsample.R (bottom) <<<-------
#{zero resid} >= ~ (n + p)/2 {–> theory of modified MAD ==> ~/R/MM/STATISTICS/robust/tmad.R
------- it might use Martin’s tmad() {trimmed mean of absolute deviations from the median}
for “deterministic”
colMedians() -> ask Henrik/ <[email protected]> about “License: Artistic-2.0”
splitFrame() [important for lmrob(.. method = “M-S”) –> lmrob.M.S()]: character should be treated as factors
by Eduardo Conceicao
R users already know optim() etc.. so the name seems more logical for them.
use Fortran routines rfncomb() and rfgenp() [in src/rf-common.f -> need F77_name(.) / F77_CALL(.)] from C
[Peter Filzmoser, Geneva 2016-07-07 talk]: covMcd() warns when n < 2*p .. should not warn but give message()
[Peter Filzmoser, Geneva 2016-07-07 talk]: solve.default(getCov(mcd)) error with CovControlOgk() init
As we do want the formula to work ==> we must allow ‘lower’ & ‘upper’ as list()s in R/nlregrob.R, have 14 matches for “eval ( *formula\[\[3L?” ((and *org shows the `[[3L].]` (no “.”) as underscored 3L)) : 123: y.hat <- eval( formula3L, c(data, setNames(par, pnames)) ) 127: y.hat <- eval( formula3L, c(data, setNames(par, pnames)) ) 141: y.hat <- eval( formula3L, c(data, setNames(par, pnames)) ) 175: res <- y - eval( formula3L, c(data, initial$par) ) 193: fit <- eval( formula3L, c(data, coef) ) 254: fit <- eval( formula3L, c(data, setNames(par, pnames)) ) 300: fit <- eval( formula3L, c(data, coef) ) 355: fit <- eval( formula3L, c(data, par) ) 361: fit <- eval( formula3L, c(data, par) ) 366: fit <- eval( formula3L, c(data, setNames(par, pnames)) ) 390: fit <- eval( formula3L, c(data, coef) ) 434: fit <- eval( formula3L, c(data, par) ) 442: fit <- eval( formula3L, c(data, setNames(par, pnames)) ) 468: fit <- eval( formula3L, c(data, coef) ) the same as in R/nlrob.R where we had eval(.., c(data, coef)) but now eval(.., c(data, start))
since we provide already most of the needed “hooks”.
–> more tests in ./tests/glmrob-1.R –> glm.fit() instead of glm() –> vcov() instead of just std.err. {is already there}
i.e. “MVE” and Andreas had a comment that “mcd” is worse. “covMcd” has been available for a while; now via robXweights() in ./R/glmrobMqle.R HOWEVER: Need something better when ‘X’ has (binary!) factors! “hat” +- works, but needs more work
function or a list containing a robMcd()-like function. Definitely need testing this situation!
–> ./R/glmrobMqle-DQD.R
- gives warning every time {-> easy to fix}
- Default is “V1” is that a good idea?
[also consider those from man/glmrob.Rd] take those from Martin’s old ‘robGLM1’ package (need more!)
were not yet moved from robGLM1 to glmrob()… (in other words: glmrob() should work
applicable –> e.g. for predict(<glmrob>, interval=”..”) !
we should decide if the current return value is fine.
Test if Huber’s C are different. Need theory to compare different C’s and same model (which includes classical vs robust).
[DONE partly; but undocumented, since bound to change –> file:~/R/MM/STATISTICS/robust/1d-scale.R , 1d-scale-sim.R, etc — unfinished!!
We have quite a few “partial” collections of rho/psi functions; some are “sync”ed now, some not yet::
in file:R/psi-rho-funs.R with further explorations, ideas in file:misc/experi-psi-rho-funs.R
–> ./R/nlrob.R ; consider even more real checks; now in tests/nlrob-tst.R
./R/biweight-funs.R
(.R/psi-funs-AR.R) replaced by using the new psi_func objects
in ./R/lmrob.MM.R
Further files, illustrating features, differences, etc: ./vignettes/psi_functions.Rnw – with quite a few FIXME ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./inst/xtraR/plot-psiFun.R chkPsiDeriv() {and plot utils} ./tests/psi-rho-etc.R compute asymp.efficiency and breakdown point ! ./tests/lmrob-psifns.R plot and lmrob()-test them
“hampel”, “bisquare”, “lqq” (polyn!): derive exact formula; others maybe, too <==> psi_func objects above?
~~~~~~~~~~ become part of robustbase, maybe under a better name, e.g. via lmrob( … control ..) or directly. It is much used in the simulations of Koller & Stahel (2011)
Consider lmrob(*, method = “M”) –> default init = “ls” (Least Sq; as MASS:::rlm.default) which calls lmrob..M..fit() which is already documented as “simple” M-estimator (though the scale is kept fixed; i.e., no ‘proposal 2’).
e.g., see FIXME in ./R/glmrobMqle.R
The argument name ‘weight.fn’ is pretty ugly and the default function name ‘hard.rejection()’ is just awful (we need a globally available function as ‘role model’.
- Could allow ‘n.iter = 0’ to simply compute Cov()ij = rcov(X_i, X_j)
probably use mcd.control() and lts.control()
or forget about *control() completely? since there are only a few in each ??????/
Default for ‘ask’ should be smarter: depend on prod(par(“mfrow”)) < #{plots} (which depends on ‘classic’ and p=2)
in addition to ‘$residuals’ and ‘$raw.residuals’; drop it or document it !
does median() , MAD() instead of using R’s sort() routines
..........................................
There are pure R implementations:
- ‘weighted.median()’ in limma and I have generalized it —> file:inst/xtraR/ex-funs.R
- more general code (different ‘tie’ strategies; weighted *quantile*s) in file:/u/maechler/R/MM/STATISTICS/robust/weighted-median.R
- The ‘Hmisc’ package has wtd.quantile()
using’s Huber’s correlation formula which ensures [-1,1] range –> ~/R/MM/Pkg-ex/robustbase/robcorgroesser1.R and ~/R/MM/STATISTICS/robust/pairwise-new.R
doesn’t Valentin have a version too? otherwise: test this, ask author for “donation” to robustbase
Still a bit problematic when denominator = 0 Currently leave away all the c/0 = Inf and 0/0 = NaN values.
MM: Maybe, it’s the fact that the coef = 1.5 should really depend on the sample size n and will be too large for small n (??) –> should ask Mia and maybe Guy Brys
Rather then, take all sub-samples of size p ==> getting a non-random result.
(and promised me “the rest” when needed) Don’t like the .x, *.y sub datasets: They shouldn’t be needed when use a *formula In his lts tests, he uses these “data sets from the literature”: (Note that ‘stackloss’ is already in “datasets”) : heart.x,heart.y, data(heart) stars.x,stars.y, data(stars) phosphor.x,phosphor.y, data(phosphor) stack.x,stack.loss, data(stackloss) coleman.x,coleman.y, data(coleman) salinity.x,salinity.y, data(salinity) aircraft.x,aircraft.y, data(aircraft) delivery.x,delivery.y, data(delivery) wood.x,wood.y, data(wood) hbk.x,hbk.y, data(hbk)