-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why not sampling all the things? HMC sampling of each individual galaxy activation #37
Comments
Are you referring to Fig 4 of the paper and saying that the stochasticity in PS at the same HOD params is the problem? Then traditional MH (i.e. emcee) way its done right now will suffer from the same issue, right? Sampling the latent parameters i.e. the "activation" of every galaxy would blow up your parameter space from 5 to N_halos*31. On a different note, for this particular problem, VI? |
Right, so, I agree that the "conventional" is I think also flawed unless you run several sims to obtain the "mean" for the likelihood. We discussed in the past and we kind of agreed it was ok, or at least what people are doing, so not a misrepresentation of a "reference point", But what worries me a little bit with HMC is that I'm not sure the algorithm is guaranteed to work correctly if the gradients and likelihood are noisy (actually it doesn't converge to the target distribution unless you introduce a friction step: https://arxiv.org/abs/1402.4102) I also think sampling over 1 million params or more is not that big of a deal here, because they should be mostly uncorrelated variables in the posterior, the activation of one particular central is not strongly correlated to the activation of a particular other one. |
And in the example above, where I only do the sampling of the centrals, looks like it's working fine |
I should try to do the same thing with the "conventional" approach to compare I guess.... |
Yep, so I went ahead and quickly coded up the equivalent of our "traditional" method on this toy model where I only sample the centrals. The comparison between the two approaches should be fair, same forward model, just in one case the "likelihood" is stochastic, in the other is is not because we track all latent variables. And surprise surprise... the stochastic HMC sampling is not working well, very hard to get a good acceptance rate. This is the best result I've achieved so far :-/ Here are my two twin test notebooks: @modichirag or @bhorowitz if you want to take a look and check whether I'm doing something obviously wrong with the "conventional" sampling, be my guest :-) And otherwise, if you think about it, tracking all latent variables comes at pretty much no extra cost. The gradients are already computed almost up to the stochastic variables already. The only extra cost is storing the "state vector" of the HMC but it's like a small factor time the size of the catalog, nothing super expansive. |
I will have a look at the notebooks. What was the acceptance rate on latent sampling? I agree that cost-wise it should be the same (except maybe memory). |
I'll try to upgrade my toy model to sampling as well satellite position and activation, and we'll see what happens. Regarding SGHMC yeah it was apparently in edward ^^' but not in edward2 as far as I can see. I found a few implementations in TF online, but didn't get a chance to try it. If you want to give it a go @modichirag let me know, maybe it would outperform the current solution in the stochastic hmc. Also I haven't tried to average over several realisations in the stochastic hmc one, I guess this should help. |
I'm not really convinced we are doing the right thing by using an HMC over the stochastically sampled power spectrum. The only correct way to do this would be to draw several power spectra, and then use the mean, i.e. using an estimator of the theoretical mean over N samples, and most likely N should be larger than 1. This is essentially the same age old problem of you can't sample cosmological parameters without sampling all of the latent variables as well.
So... why don't we bite the bullet and just sample all of the latent variables of the model, i.e. the "activation" (whether the galaxy is on or off) of every single galaxy in the mock, at the same time as we sample the HOD parameters. Turns out, that this has little extra cost compared to what we were doing before, because the computations of forward and backward pass are strictly the same, the only potential cost is storage of latent variables in th MCMC trace, but we can circumvent that.
Here is a proof of concept sampling only the centrals:
https://colab.research.google.com/drive/1jsYwqxvw05LmG6jmYzHa13t1kjhH8q0F?usp=sharing
And it works nicely:
The super nice thing about this approach is that I think you might not need a covariance matrix! only diagonal measurement errors on your power spectrum. Because we track all the latent variables.
Anyways, this looks tractable to me, curious to hear what other people think.
The text was updated successfully, but these errors were encountered: