Rerun Redrock with dynamic priors based on QN redshift #2298

dylanagreen · 2024-07-19T16:49:42Z

This PR addresses #2293 and updates the QuasarNET afterburner to run using a prior defined based on the QuasarNET redshift rather than a constant width prior across the entire redshift range.

Using the width of the QN output grid boxes in redshift space we determine that the width of the prior goes as $\Delta z = a (1 + z)$. We can calculate the necessary value of $a$ from the QN output grid bin widths. The following snippet of code will calculate the width necessary for the new RR priors, assuming QuasarNET uses 13 output bins (which any and all weights files we've used thus far in DESI do) and a logarithmic scaled grid:

desispec/py/desispec/scripts/qsoqn.py

Lines 312 to 322 in 07bdb74

    
           # Code to calculate the scaling constant for the dynamic RR prior 
        
           l_min = np.log10(wave_to_use[0]) 
        
           l_max = np.log10(wave_to_use[-1]) 
        
           # If this changes you must change it here. A future update to QuasarNP 
        
           # might store nboxes as a model parameter but as of 0.2.0 this is not the case. 
        
           nboxes = 13 
        
           dl_bins = (l_max - l_min) / nboxes 
        
           a = 10**(dl_bins) - 1 
        
           log.info(f"Using {a = } for redrock prior scaling")

This value is about 0.0814 Note that this value differs from that fit and shown in the above linked issue slightly due to a bug in my plot related to which x coordinate I used to plot the box widths.

For validation I have run 20 pseudorandomly chosen tiles from jura/cumulative: 25574 83024 24020 42016 24421 41937 8829 24633 41917 8981 20113 41761 11089 22610 3036 9378 7219 3543 42115 1000 and compared the results with the old and new priors, using the iron weights file for QN. In these 20 tiles I get the following statistics (excluding sky fibers, some of which did trigger a rerun):

Name	C_MAX > 0.5	C_MAX > 0.95
N_SPEC != SKY	81036	81036
N_RERUN_MAIN	1482	654
N_RERUN_BRANCH	1440	623
N_RERUN_BOTH	1440	623
Z_NEW_MAIN != Z_NEW_BRANCH	153	33

I've orgnized the data into two columns based on whether a rerun is triggered by the MTL confidence cut (C_MAX > 0.5) or the more restrictive QSO Catalog confidence cut (C_MAX > 0.95). Note that it is expected that N_RERUN_BRANCH $\leq$ N_RERUN_MAIN, because we do not rerun any spectra where the redrock redshift is already within the prior width because redrock would get the same result. In this case, all spectra that were rerun in main but not rerun in this branch fall within the updated prior widths so were not rerun in branch.

In summary ~2% of spectra trigger a rerun, and of those ~5-10% receive a different redshift with the new priors.

The following plot shows these TARGETIDS who received different redshifts due to the new priors, using the MTL confidence cut of C_MAX > 0.5:

For every orange point within the old prior there is a corresponding blue point outside the old prior with the new redshift.

In order to validate that this width does not make us susceptible to any line confusions I have produced a version of the above plot with the necessary prior width that would give some common Quasar line confusions validating that our prior width does not introduce the possibility of RR line confusions.

I have manually inspected the $\chi^2$ scans of some of these differences to validate that at least from a redrock perspective these changes in redshift seem reasonable. The following are a few examples validating that the new redshifts fall into $\chi^2$ minima that are either outside or at the edge of the old prior but are now captured accurately in the new prior. I include only a few examples, more can be provided on request.

Fits (C_MAX > 0.5)

Most of these are low SNR, for most of these it is clear that the new priors have a better minima in redrock but it's not necessarily clear that it's a better fit.

FIts (C_MAX > 0.95)

These tend to be higher SNR. These are only a few examples, some mixed and some a success for the new priors.

- Previously code implicitly computed the prior in two different places. This commit reduces it to 1 so we can calculate it directly from the QN grid.

- Previously stored a magic number, now calculate it from the QN model grid assuming logarithmic grid.

sbailey · 2024-07-26T15:53:19Z

Thanks @dylanagreen. This PR looks good to me. @julienguy is also going to run it through his simulation study as a final check, then hopefully we can merge.

My main take away is that expanding the priors to be the width of the QN bins is the right thing to do, but isn't going to recover us a bunch of good targets. Those specific cases you showed were mostly pretty for both of the before/after fits.

julienguy · 2024-07-26T18:22:49Z

py/desispec/scripts/qsoqn.py

-        sel_QSO_RR_with_no_z_pb[sel_QN] &= np.abs(redrock['Z'][sel_QN] - z_QN[sel_index_with_QN]) <= 0.05
-        log.info(f"Remove {sel_QSO_RR_with_no_z_pb[sel_QN].sum()} objetcs with SPECTYPE==QSO and |z_RR - z_QN| < 0.05 (since even with the prior, RR will give the same result)")
+        prior = a * (z_QN + 1) # Analytic prior width from QN box width
+        sel_QSO_RR_with_no_z_pb[sel_QN] &= np.abs(redrock['Z'][sel_QN] - z_QN[sel_index_with_QN]) <= (prior[sel_index_with_QN] / 2)


If the prior width is the width of the wavelength bin used by QN to identify the lines, shouldn't the delta_z selection cut be the full prior width and not half of it?

I used the same convention here as was used in the previous version, which also cut on prior_width / 2 (when prior_width = 0.1).

Redrock accepts an input prior width and a central redshift value for the prior. The prior is then centered at the given center with prior_width / 2 on either side of the central redshift given, which is why I (and whoever wrote the previous version) cut on abs(z_rr - z_qn) <= prior_width / 2, since those are the bounds used by the redrock convention.

The prior width in this PR is calculated such that if the Z_QN value is exactly in the middle of a box, the prior width is the entire box, i.e. the prior extends to the edge of the QN box on either side of the redshift, so it is correct as far as I'm aware to cut on prior_width / 2.

Isn't it true that the prior_width should then be twice the possible range of redshifts defined by the logarithmic wavelength bin, because we don't want to discard solutions where for instance RR redshift is at the lower bound and QN redshift at the upper bound? I am asking because it seems that the scale factor 'a' should then be twice as large as the value used in the code (but I may be wrong).

I suppose that is a possibility, but it seems to me to be unlikely. QuasarNET tends to clump estimates in the middle of the boxes, it is true, but the converse of that is that when QuasarNET is at the edge of the box it tends to be somewhat sure that it's actually there, or alternatively if a line is at the edge of the plot QuasarNET's most confident line will probably be a different line in the middle of that line's box.

Based on my confusion plot we can probably get to 2x prior width and still be ok relative to line confusions. Of the 1440 that got rerun in my initial sampling in this branch, only 3 of them have RR redshifts that lie at or slightly beyond the boundary of the current prior. Only 23 are within 3000 km/s of the either edge of the current prior boundary. The overwhelming majority of reruns find a redshift within the prior range.

It might be useful to compare to data we have redshifts on:

Based on this version of the prior width plot in redshift space rather than velocity space, and comparing it to the simulations you presented at desi-data would indicate that most of the points you plotted comparing the QN redshift to the true simulated redshift would fall within this new prior without the need to double it, although it's hard to say without including the actual data. Comparing the v1.3 weights to the validation dataset I have held out there there are some spectra to be gained outside the current prior_width (y axis bounds here set to 2*prior_width for comparison):

The gain going from current width -> width * 2 seems small, compared to the gain from main_width -> branch_width. Actually the new weights significantly improve on this aspect...

I could be convinced the double the prior width, but I don't see a compelling reason to. If there's a desire though I can double it and rerun the same 20 tiles as before and see what the results look like.

julienguy · 2024-07-26T20:05:02Z

I tested this branch with the QSO sims from the Lyman-alpha WG. For QN v1.3 (latest version), the results are the same as with the main branch, but for the QN version used in Y1, it is an improvement, with 0.09% outliers (|delta z|>0.05) compared to 0.12% in the final catalog. The improvement is probably more significant with real data where there is more diversity in the quasar spectra.

sbailey · 2024-07-28T23:07:52Z

A few thoughts from the brink of vacation:

if QN was just providing "this line is somewhere in this bin" with no other info, then I'd agree with Julien that we need a prior that is +/-binwidth to cover the full range from a central value at one bin edge to a possible value at the other bin edge;
However, QN is providing sub-bin information, and as Dylan notes the issue we're trying to address is that it has a bias towards the center of a bin, i.e. a +/- half-bin bias. I don't think we have evidence that it has a full-bin scatter, so I think Dylan's +/- half-bin prior here is well motivated.

However, I remain confused about a more basic point: A logarithmic wavelength binning is a constant step in $\Delta z$ going from one bin to the next. i.e. if we were just trying to fix for a bias within a bin, I think the original implementation of a constant prior in $\Delta z$ would be correct for a logarithmic binning. For 13 bins from 3600 to 9961, that $\Delta z$ between bins is 0.08, which probably motivated the original implementation of $\Delta z = 0.1$ to be slightly conservative. I think instead we are re-discovering that redshift errors tend to be constant in velocity instead of $\Delta z$, thus a prior in velocity space scaling with (1+z) is more appropriate than a constant $\Delta z$ prior, and the relationship to bin width is perhaps coincidental.

I am disappearing on vacation so won't re-engage with this for a bit, but please do double check if the logarithmic bins are indeed a constant $\Delta z$. My concern is that the calculation via $a$ is coincidentally getting us a good recommended prior of +/- 12,000 km/s, but that calculation wouldn't actually be correct if e.g. some future version of QN used more or less bins.

On a purely pragmatic level, the simulation study from Julien does show the scatter constant in velocity (not $\Delta z$) and the prior in this PR appears to be generous, especially for the improved scatter of the v1.3 model, so I think we could merge this as-is.

dylanagreen · 2024-07-29T01:05:12Z

@sbailey I can try and provide some clarity at least on what we use in the code. The equation I use in the code is derived in the following way, considering the edges of two adjacent bins as $\lambda_1$ and $\lambda_2$, where 1 is the lower and 2 the upper bin and a completely arbitrary emission line $\lambda_e$ (of any wavelength, as you'll see this is just a placeholder), then the redshift difference between the two bin edges is

$$ \Delta z = z_2 - z_1 = \frac{\lambda_2}{\lambda_e} - \frac{\lambda_1}{\lambda_e}$$

where I rewrite

$$\lambda_e = \frac{z_1 + 1}{\lambda_1} = \frac{z_2 + 1}{\lambda_2}$$

so that

$$ \Delta z = \frac{\lambda_2 - \lambda_1}{\lambda_1} (z_1 + 1) = \frac{\lambda_2 - \lambda_1}{\lambda_2} (z_2 + 1)$$

$$ \Delta z = (\frac{\lambda_2}{\lambda_1} - 1)(z_1 + 1) = (1 - \frac{\lambda_1}{\lambda_2}) (z_2 + 1)$$

and evidently from logarithmic properties:

$$ \frac{\lambda_2}{\lambda_1} =10^{\log(\lambda_2) - \log(\lambda_1)} = 10^{\Delta \log(\lambda)}$$

Using log base 10, which is what QuasarNET uses. A constant logartihmic grid means that $\Delta \log(\lambda)$ is completely independent of which $\lambda_1$ or 2 we use, which is nice at least. Finally we get the equation I used in the code:

$$ \Delta z = (10^{\Delta \log(\lambda)} - 1)(z + 1) $$

I've done a little abuse of notation to remove the subscript on the redshift used here, since the $\Delta \log(\lambda)$ is constant regardless of which bin edge/bin center/arbitrary wavelength. If a future version of QuasarNET would use a different grid, it would change the value of a dependent on the number of bins, since $\Delta \log(\lambda)$ is dependent on the number of bins used (I think I noted this in the code).

Incidentally, this rewrite of Equation (3) shows why the $\Delta z$ is dependent on which line is used for redshift estimation when using constant linear spacing, but is independent of which bin you use:

$$ \Delta z = \frac{\Delta \lambda}{\lambda_1} (z_1 + 1) = \frac{\Delta \lambda}{\lambda_2} (z_2 + 1) = (\Delta \lambda) \lambda_e$$

I think the power of a log-lambda grid is not that it's constantly spaced in $\Delta z$, but that redshifting the grid, say to the rest-frame maintains the same grid spacing in log-lambda space (i.e. $\Delta \log \lambda$ is constant regardless of redshift), whereas with a linear grid the linear spacing of the grid in rest-frame is dependent on the redshift.

For 13 bins from 3600 to 9961, that between bins is 0.08

I probably have a misunderstanding of my own, since I'm not sure where this value comes from myself, so some clarity would be appreciated.

dylanagreen · 2024-08-01T19:08:13Z

Per discussion with @julienguy I've run main, branch, and branch (2x prior size) on the VI'd QSO tiles: 80605, 80607, and 80609. Here's a table of results, and a few related plots:

Name	C_MAX > 0.5	C_MAX > 0.95
N_SPEC != SKY (with any VI results)	3779	3779
N_RERUN_MAIN	165	118
N_RERUN_BRANCH	162	116
N_RERUN_BRANCH_2x	161	115
Z_BRANCH != Z_BRANCH_2x*	18 (12 with VI_QUAL >= 2.5)	6

For the last row I only consider things that were rerun in both branch and branch 2x.

I include here a few examples of the 12 that differ (with the VI quality cut) with VI redshifts plotted in the chi^2 scan. If there's no VI redshift in the scan that means it was outside the range of any of the priors:

It seems that overall not much really changes between the two in terms of triggered reruns, although 10% of the reruns triggered in this PR change when we double the width of the prior. In most cases these are comparably low SNR fits, although in the examples two of them recovers the VI redshift with the 2x width prior.

julienguy · 2024-08-05T21:38:45Z

Merging after deciding to double the width of the prior because we have two clear case (with large delta_chi2) where QN got it wrong by more than half the width and redrock got the correct solution.

dylanagreen added 6 commits July 17, 2024 09:33

Make rr priors depend on qn estimated redshift.

2c1db6d

Fix typo in logging message.

ab22246

Change rerun criterion on QN qsos to match new prior.

64f5245

Only calculate prior once and propogate.

a93e545

- Previously code implicitly computed the prior in two different places. This commit reduces it to 1 so we can calculate it directly from the QN grid.

Calculate prior width scaling on the fly.

07bdb74

- Previously stored a magic number, now calculate it from the QN model grid assuming logarithmic grid.

Clean up some commented out code.

98eb3b1

sbailey added the algorithm development label Jul 19, 2024

julienguy reviewed Jul 26, 2024

View reviewed changes

Double width of prior.

fff0e28

julienguy merged commit 109e454 into main Aug 5, 2024
26 checks passed

akremin mentioned this pull request Aug 7, 2024

Update redrock prior widths to be wider than QuasarNET "wiggles" in redshift #2293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rerun Redrock with dynamic priors based on QN redshift #2298

Rerun Redrock with dynamic priors based on QN redshift #2298

dylanagreen commented Jul 19, 2024

sbailey commented Jul 26, 2024

julienguy Jul 26, 2024

dylanagreen Jul 26, 2024

julienguy Jul 26, 2024

dylanagreen Jul 26, 2024

julienguy commented Jul 26, 2024

sbailey commented Jul 28, 2024

dylanagreen commented Jul 29, 2024

dylanagreen commented Aug 1, 2024

julienguy commented Aug 5, 2024

	# Code to calculate the scaling constant for the dynamic RR prior
	l_min = np.log10(wave_to_use[0])
	l_max = np.log10(wave_to_use[-1])

	# If this changes you must change it here. A future update to QuasarNP
	# might store nboxes as a model parameter but as of 0.2.0 this is not the case.
	nboxes = 13

	dl_bins = (l_max - l_min) / nboxes
	a = 10**(dl_bins) - 1
	log.info(f"Using {a = } for redrock prior scaling")

Rerun Redrock with dynamic priors based on QN redshift #2298

Rerun Redrock with dynamic priors based on QN redshift #2298

Conversation

dylanagreen commented Jul 19, 2024

Fits (C_MAX > 0.5)

FIts (C_MAX > 0.95)

sbailey commented Jul 26, 2024

julienguy Jul 26, 2024

Choose a reason for hiding this comment

dylanagreen Jul 26, 2024

Choose a reason for hiding this comment

julienguy Jul 26, 2024

Choose a reason for hiding this comment

dylanagreen Jul 26, 2024

Choose a reason for hiding this comment

julienguy commented Jul 26, 2024

sbailey commented Jul 28, 2024

dylanagreen commented Jul 29, 2024

dylanagreen commented Aug 1, 2024

julienguy commented Aug 5, 2024