Is init_theta keyword working? #1588

bratslavia · 2021-04-13T11:59:49Z

I've been trying to start my samplers in "better" regions of the parameter space, since it's a bit slow to converge otherwise on certain parameters. But from what I can tell, whenever I pass init_theta as per the docs, whatever values I have there are just ignored.

MWE:

using Turing

# Model definition.
@model demo(x) = begin
    u ~ MvNormal(zeros(2), ones(2))
    x ~ MvNormal(u, ones(2))
end

a = ones(2)
start_vals = [-10, -10]
m1 = sample(demo(a), MH(diagm([0.001, 0.001.])), MCMCThreads(), 10, 4, init_theta=start_vals)
m2 = sample(demo(a), HMC(0.0001, 1), MCMCThreads(), 10, 4, init_theta=start_vals)

Array(m1)
Array(m2)

Here, I start the sampler in a far-away place. The MH proposals in m1 are tiny, so it should be taking quite a while to get away from the region around (-10,-10); apparently, the starting values are just being ignored. I wasn't sure if it was just a MH issue, so I try HMC as well in the above example, and it appears to also ignore the supplied starting values.

After digging around some more, I tried changing it to the keyword init_params instead of init_theta. This works for the MH code above, but not for HMC (oddly enough, since the HMC sampling code is where I found the init_params keyword in the first place. I also tried NUTS, and didn't see any evidence init_params was being respected there, either (although it's much harder to tell with an algorithm like NUTS, that can move away from starting values so quickly).

So, it's not clear to me if this is just a case of the docs being in need of an update, or if there's something else wrong.

And in any case, if init_theta isn't a valid keyword anymore, perhaps supplying such a keyword should give an error or something? I just spent several days trying to debug code that I thought had some subtle identification error that was making my sampler "blow up" right at the start of sampling, when really it was just sample silently ignoring starting values set by a deprecated(?) keyword.

FWIW, being able to catch stuff like this easily is one reason I strongly favor recording the initial values of samplers as part of the chain itself (per the discussion in #1282), something I always do if writing samplers by hand.

The text was updated successfully, but these errors were encountered:

devmotion · 2021-04-13T12:02:01Z

Unfortunately, the documentation is largely outdated. It is called init_params (e.g.,

Turing.jl/src/inference/hmc.jl

Line 148 in 602aa5f

init_params=nothing,

).

Edit: Ah, I see you already noticed this.

devmotion · 2021-04-13T12:04:46Z

Did you keep the warmup phase in NUTS? Otherwise probably it is difficult to check if your sampler started from your initial setting.

devmotion · 2021-04-13T12:07:41Z

Regarding the implementation: it is actually not part of Turing.jl but defined in DynamicPPL.jl (https://github.com/TuringLang/DynamicPPL.jl/blob/9d4137eb33e83f34c484bf78f9a57f828b3c92a0/src/sampler.jl#L76). Maybe at some point it could be moved to AbstractMCMC.jl even. IIRC HMC just has to be handled in a special way since we set some default values if they are not provided by the user with the sampler, and therefore the keyword argument shows up there (and in the emcee sampler AFAIK) but not in any other sampler in Turing.jl.

bratslavia · 2021-04-13T12:08:27Z

With NUTS, I first used a warmup-phase of 1, but even with a warm-up phase of 0, it still seems to ignore the supplied values.

devmotion · 2021-04-13T12:17:34Z

Can you add a print statement in

Turing.jl/src/inference/hmc.jl

Line 157 in 602aa5f

theta = vi[spl]

and check if your initial values show up there? If not there's a problem with how keyword arguments are forwarded and/or how we forward calls to DynamicPPL and AbstractMCMC. Otherwise it works as expected (but maybe not as it should) since currently we perform one initial step with AdvancedHMC before returning the first sample (

Turing.jl/src/inference/hmc.jl

Lines 194 to 220 in 602aa5f

    
           traj = gen_traj(spl.alg, ϵ) 
        
           # Create initial transition and state. 
        
           # Already perform one step since otherwise we don't get any statistics. 
        
           t = AHMC.step(rng, hamiltonian, traj, z) 
        
           # Adaptation 
        
           adaptor = AHMCAdaptor(spl.alg, hamiltonian.metric; ϵ=ϵ) 
        
           if spl.alg isa AdaptiveHamiltonian 
        
               hamiltonian, traj, _ = 
        
                   AHMC.adapt!(hamiltonian, traj, adaptor, 
        
                               1, nadapts, t.z.θ, t.stat.acceptance_rate) 
        
           end 
        
           # Update `vi` based on acceptance 
        
           if t.stat.is_accept 
        
               vi[spl] = t.z.θ 
        
               setlogp!(vi, t.stat.log_density) 
        
           else 
        
               vi[spl] = theta 
        
               setlogp!(vi, log_density_old) 
        
           end 
        
           transition = HMCTransition(vi, t) 
        
           state = HMCState(vi, 1, traj, hamiltonian, t.z, adaptor) 
        
           return transition, state

).

bratslavia · 2021-04-13T12:31:17Z

I did a println(theta) at line 158. The result was not the values I passed -- should have been [-10,-10], instead I got [1.563217652248911, -0.5123821037824636].

devmotion · 2021-04-13T12:50:39Z

Oh, I assume the problem is that SampleFromUniform that is used for sampling the initial values and rerunning the model for computing the log joint probability when we set the provided initial values is very weird: it always overwrites existing samples (https://github.com/TuringLang/DynamicPPL.jl/blob/9d4137eb33e83f34c484bf78f9a57f828b3c92a0/src/context_implementations.jl#L127-L128). I.e., the initial values that we set in https://github.com/TuringLang/DynamicPPL.jl/blob/9d4137eb33e83f34c484bf78f9a57f828b3c92a0/src/sampler.jl#L77 are resampled in https://github.com/TuringLang/DynamicPPL.jl/blob/9d4137eb33e83f34c484bf78f9a57f828b3c92a0/src/sampler.jl#L80.

It would make sense that the error does only occur with HMC since AFAIK SampleFromUniform is only used by HMC.

IMO we should change the behaviour of SampleFromUniform and only resample values if they are clearly marked to be resampled.

devmotion · 2021-04-13T12:56:30Z

Alternatively, one could always resample when using a sampler and perform evaluation only with a dedicated evaluation context (would require some refactoring along the lines of TuringLang/DynamicPPL.jl#80 probably). One has to be a bit careful though and probably special case the evaluation context for every sampler since depending on the sampler we store the log-likelihood, the log joint probability, or other things in varinfo.logp (and the whole point of rerunning the model after setting the user-provided initial values is to ensure that varinfo.logp is consistent with the variables and would be equivalent to the value that one would get if one would have sampled the variables).

devmotion · 2021-04-13T13:10:58Z

My hypothesis in #1588 (comment) is correct, the problem is that SampleFromUniform resamples the user-provided initial values. Printing the values in DynamicPPL shows that the initial values are set correctly and commenting out https://github.com/TuringLang/DynamicPPL.jl/blob/9d4137eb33e83f34c484bf78f9a57f828b3c92a0/src/sampler.jl#L80 fixes the problem for HMC (but, of course, that's not something we want to do).

evrenmturan · 2021-04-16T15:43:28Z

Sounds like this is the same problem that @mateuszbaran had #1563. Someone was also asking about this in the slack channel a few weeks ago if I remember.

mateuszbaran · 2021-04-16T16:00:35Z

Yes, that looks like the same problem. Unfortunately all I can tell is that commenting out that line didn't improve sampling from my model (but fixing one problem with my data and giving better prior distributions did help).

devmotion · 2021-04-16T16:17:57Z

I opened a PR with a temporary workaround in TuringLang/DynamicPPL.jl#232.

…niform` (#232) This PR is a quick fix for TuringLang/Turing.jl#1563 and TuringLang/Turing.jl#1588. As explained in TuringLang/Turing.jl#1588 (comment), the problem is that currently `SampleFromUniform` always resamples variables in every run, and hence also initial parameters provided by users are resampled in https://github.com/TuringLang/DynamicPPL.jl/blob/9d4137eb33e83f34c484bf78f9a57f828b3c92a0/src/sampler.jl#L80. As mentioned in TuringLang/Turing.jl#1588 (comment), a better long term solution would be to fix this inconsistency and use dedicated evaluation and sampling contexts, as suggested in #80.

devmotion · 2021-04-18T21:39:10Z

The fix is available in the latest release of DynamicPPL. It should be installed automatically if you update your packages.

devmotion mentioned this issue Apr 16, 2021

[Merged by Bors] - Fix initialization of parameters for algorithms that use SampleFromUniform TuringLang/DynamicPPL.jl#232

Closed

bratslavia closed this as completed Apr 23, 2021

devmotion mentioned this issue Oct 27, 2021

Difficulty sampling a model with truncated normal Likelihood #1722

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is init_theta keyword working? #1588

Is init_theta keyword working? #1588

bratslavia commented Apr 13, 2021

devmotion commented Apr 13, 2021 •

edited

Loading

devmotion commented Apr 13, 2021

devmotion commented Apr 13, 2021

bratslavia commented Apr 13, 2021

devmotion commented Apr 13, 2021 •

edited

Loading

bratslavia commented Apr 13, 2021

devmotion commented Apr 13, 2021

devmotion commented Apr 13, 2021

devmotion commented Apr 13, 2021

evrenmturan commented Apr 16, 2021

mateuszbaran commented Apr 16, 2021

devmotion commented Apr 16, 2021

devmotion commented Apr 18, 2021

Is init_theta keyword working? #1588

Is init_theta keyword working? #1588

Comments

bratslavia commented Apr 13, 2021

devmotion commented Apr 13, 2021 • edited Loading

devmotion commented Apr 13, 2021

devmotion commented Apr 13, 2021

bratslavia commented Apr 13, 2021

devmotion commented Apr 13, 2021 • edited Loading

bratslavia commented Apr 13, 2021

devmotion commented Apr 13, 2021

devmotion commented Apr 13, 2021

devmotion commented Apr 13, 2021

evrenmturan commented Apr 16, 2021

mateuszbaran commented Apr 16, 2021

devmotion commented Apr 16, 2021

devmotion commented Apr 18, 2021

devmotion commented Apr 13, 2021 •

edited

Loading

devmotion commented Apr 13, 2021 •

edited

Loading