equal p(z) bins for all sources in mock_data #404

cristobal-sifon · 2021-06-13T18:53:52Z

The current implementation with photoz_sigma_unscaled generates a pdf that is sampled differently for every galaxy. From _compute_photoz_pdfs:

for row in galaxy_catalog:
    pdf_range = row['pzsigma']*10.
    zmin, zmax = row['z']-pdf_range, row['z']+pdf_range
    zbins = np.arange(zmin, zmax, 0.03)

so the result is a column like:

               pzbins [121]              
-----------------------------------------
 -0.9189157769563053 .. 2.681084223043698
-0.8755947879144116 .. 2.7244052120855917
 -0.9013367845355461 .. 2.698663215464457
 -1.0683426683023802 .. 2.531657331697623
 -1.0260346754041412 .. 2.573965324595862
-0.8371474311387707 .. 2.7628525688612324
 -0.7837649451484183 .. 2.816235054851585
 -1.0040270867734482 .. 2.595972913226555
  -1.1684000482540333 .. 2.43159995174597
-0.7198019908543756 .. 2.8801980091456274
                                      ...
-1.3521789249221339 .. 2.2478210750778693
-0.9578447603266889 .. 2.6421552396733143
 -0.9215149025501451 .. 2.678485097449858
-0.9071426552874368 .. 2.6928573447125665
-1.0662076783207386 .. 2.5337923216792646
-1.2636985872843276 .. 2.3363014127156756
 -1.136257420102231 .. 2.4637425798977723
 -0.7774128187340363 .. 2.822587181265967
-1.0730126785251919 .. 2.5269873214748113
-0.9649808022321504 .. 2.6350191977678525
-1.3844228023509195 .. 2.2155771976490835

This is both very inconvenient, because the user then has to interpolate if they want to combine the pdfs, and a big waste of space.

I propose to change this for a single binning scheme that depends on (zsrc_min,zsrc_max) and a new optional argument zsrc_bin_width or similar, so that the binning can be stored as a separate attribute of GalaxyCluster, e.g., zsrc_pdf_centers or just zsrc_bins. See related suggestion in #401.

The text was updated successfully, but these errors were encountered:

johannct · 2021-06-13T19:19:44Z

If I understand well, the incriminated lines are

CLMM/clmm/support/mock_data.py

Line 380 in b94f88b

for row in galaxy_catalog:

and subsequent. Certainly one binning per source galaxy in this context is overkill. Another possible way to proceed is to avoid the for loop and get the min and max of the galaxy_catalog['z'] itself, and then extend these zmin and zmax by pdf_range, and proceed. The resulting binning will be unique and applicable to all the galaxies in the catalog (in general I think that this for loop should disappear, but that will demand a bit of careful broadcasting).

cristobal-sifon · 2021-06-13T19:28:16Z

Yes, we can replace that for loop by the following (which I haven't tested):

zbins = np.arange(zsrc_min, zsrc_max+zsrc_bin_width, zsrc_bin_width)
pzpdf_grid = np.exp(-(zbins[:,None]-galaxy_catalog['z'])/(2*pzsigma**2) / ((2*np.pi)**0.5*pzsigma)

or maybe the transpose of that. (And by the way, pzsigma does not need to be a column in galaxy_catalog.) We do need to be careful that this does not start to produce memory issues if the arrays are too large. So perhaps it could be done within a loop but in chunks of say 100 or 1000

m-aguena · 2021-06-14T10:49:42Z

As I mentioned here, this part of the code is not really being used for now.
@combet do you remeber who added that? I am not sure if there is a specific reason to have different binnings for each galaxy or if it was a naive implementation.

johannct · 2021-06-14T11:16:41Z

git blame say Matthew Kirby 2 years ago, for the for loop, and then some refactoring 8 months ago by Céline, Alex, and a couple others. There are many ways to improve the code here, but there is not much point in doing so before it is really used. And I do not think that "really using it" should wait for qp to enter the game. Quite the contrary actually IMHO.....

combet · 2021-06-17T16:45:31Z

I don't recall why we implemented it this way. I guess one question is what will happen in real life? Will photoz codes provide pdf on a single binning for all galaxies or not (it's not obvious to me). If this is the standard then I completely agree that this should be changed to something like @cristobal-sifon proposes.

johannct · 2021-06-17T16:51:42Z

I think that the code here should not care. You should be able to mock pdfs, ingest them in some TBD form by something like qp, and make sure that the code downstream does not depend on the way the pdf is represented. This is the whole point of qp. So I do not think that it matters what real life will be : here we need to abstract away from it. The only real question to my mind is how it is provided, which is a different question : e.g. inside an object catalog, or separately with some id to join to an object catalog?

m-aguena · 2021-06-25T15:08:59Z

Given this week's tag up discussion, we decided to have a more abstract approach for storing the PDF in galcat. The idea is to have the PDF column as an object type and add a pdf attribute to the galaxy catalog to describe the use of this column. So we can have something like:

pdf_description={'type': 'shared_bins', 'info': bins} -> pdf_column: array of size=n bins
pdf_description={'type': 'unique_bins'} -> pdf_column: tuple with (pdf bins, pdf vals)
pdf_description={'type': 'quantiles', 'info': (0.68, 0.95, 0.98)} -> pdf_column: tuple with quantiles (68%, 95%, 99%)

right now we can start implementing this first case with simple modification of what is in branch issue/404/galcat_pdf_improvement

eacharles · 2021-06-28T20:42:42Z

FWIW, this is really pretty heavily overlapping with functionality in qp and in particular the construction of the qp.Ensemble, which represents a set of pdfs.
I.e.,

qp.Ensemble(qp.interp, data=dict(xvals=yvals, yvals=yvals)) # where xvals is an array of nPoints, yVals is an array of (nPoint, nPdfs)
qp.Ensemble(qp.quant, data=dict(quants=quants, locs=locs)) # where quants are the quants (i.e., 0.68, 0.95, 0.98) and locs are the corresponding locations

It would be good to chat a bit about this.

marina-ricci · 2022-05-09T16:03:05Z

This is linked to issue #401 (interfacing 'clmm and qp)

cristobal-sifon mentioned this issue Jun 13, 2021

Add p(z) to GalaxyCluster object #401

Closed

cristobal-sifon self-assigned this Jun 18, 2021

cristobal-sifon added a commit that referenced this issue Jun 18, 2021

addresses #404

df012cc

marina-ricci added the enhancement Tackle this someday but not immediately label May 9, 2022

m-aguena linked a pull request Feb 9, 2023 that will close this issue

Issue/401/use qp #534

Closed

m-aguena removed a link to a pull request Feb 13, 2023

Issue/401/use qp #534

Closed

m-aguena mentioned this issue Feb 13, 2023

Issue/404/galcat pdf improvement #570

Merged

combet closed this as completed in #570 Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

equal p(z) bins for all sources in mock_data #404

equal p(z) bins for all sources in mock_data #404

cristobal-sifon commented Jun 13, 2021

johannct commented Jun 13, 2021 •

edited

Loading

cristobal-sifon commented Jun 13, 2021

m-aguena commented Jun 14, 2021

johannct commented Jun 14, 2021

combet commented Jun 17, 2021

johannct commented Jun 17, 2021

m-aguena commented Jun 25, 2021 •

edited

Loading

eacharles commented Jun 28, 2021

marina-ricci commented May 9, 2022

equal p(z) bins for all sources in mock_data #404

equal p(z) bins for all sources in mock_data #404

Comments

cristobal-sifon commented Jun 13, 2021

johannct commented Jun 13, 2021 • edited Loading

cristobal-sifon commented Jun 13, 2021

m-aguena commented Jun 14, 2021

johannct commented Jun 14, 2021

combet commented Jun 17, 2021

johannct commented Jun 17, 2021

m-aguena commented Jun 25, 2021 • edited Loading

eacharles commented Jun 28, 2021

marina-ricci commented May 9, 2022

johannct commented Jun 13, 2021 •

edited

Loading

m-aguena commented Jun 25, 2021 •

edited

Loading