Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test candidates for replacing tiles-daily.csv and exposures-daily.fits #2345

Closed
weaverba137 opened this issue Aug 26, 2024 · 50 comments
Closed
Assignees
Labels

Comments

@weaverba137
Copy link
Member

I've prepared some patched versions of the top-level tiles-daily.csv and exposures-daily.(fits|csv) files.

The patch data come from:

  • jura, in the case of exposures and the FRAMES HDU in exposures-daily.fits.
  • The raw data files, in the case that MJD is zero.
  • Setting any remaining masked data (e.g. NaN from GFA summary data) to zero.
  • faflavor2program(), in the case of tiles with a missing PROGRAM.
  • Setting tiles with SURVEY == 'unknown' to 'cmx', essentially by hand.

Please take a look at:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-jura-20240826.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-jura-20240826.csv

For the best possible comparison, I made copies of tiles-daily.csv and exposures-daily.fits at the time they were patched:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-original-20240826.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-original-20240826.csv

The patching only takes ~1 minute, so we can iterate on any problems found. Then, when we're ready, we can choose a "quiet" time, possibly after kibo to permanently replace the top-level daily files, which will then be appended to as usual, i.e. desi_tsnr_afterburner.

@weaverba137
Copy link
Member Author

PS, the patching code is currently in this file.

@araichoor
Copy link
Contributor

I m poking around a bit with the files.
while I suspect that the code is right, one question I think of is: what if the jura values are "worse" than the daily ones?

maybe that is out of scope of this exercise here, I don t know.
maybe we want to fix those "errors" in jura for kibo, and then re-run such a script?

example of one check, which looks for replaced non-zero, finite values (for float/int keys only):

# read
mydir = "/global/cfs/cdirs/desi/users/bweaver"
a = Table(fitsio.read(os.path.join(mydir, "exposures-daily-original-20240826.fits"), "EXPOSURES"))
b = Table(fitsio.read(os.path.join(mydir, "exposures-daily-patched-with-jura-20240826.fits"), "EXPOSURES"))
c = Table(fitsio.read("/global/cfs/cdirs/desi/spectro/redux/jura/exposures-jura.fits", "EXPOSURES"))

# check
for k in a.colnames:
    if not isinstance(a[k][0], (int, float)):
        continue
    sel = (a[k] != 0.) & (a[k] != b[k])               # values differ and daily!=0
    sel &= np.isfinite(a[k])                               # ignore case where daily is nan's
    sel2 = (sel) & (~np.isclose(a[k], b[k]))     # restrict to not-close values
    if sel.sum() > 0:
        print("{}\t{}\t{}".format(sel.sum(), sel2.sum(), k))

returns

# N N_NOTCLOSE KEY
3	0	TILEDEC
26	0	MJD
4475	4458	EBV
929	793	SEEING_ETC

for the EBV, maybe it s ok, i.e. it s just we implemented a slightly different way to retrieve the EBV.
but for SEEING_ETC, for instance, there are ~200 of the 793 rows where the daily has apparently valid values, and patching with jura fills those with zeros.
one example, expid=77930 from 20210224:

>>> a[a["EXPID"] == 77930]["SEEING_ETC"][0]
0.9334579706192017
>>> b[b["EXPID"] == 77930]["SEEING_ETC"][0]
0.0
>>> c[c["EXPID"] == 77930]["SEEING_ETC"][0]
0.0

and from the data header, the 0.933 value looks right:

fitsheader -e 1 -k *SEEING* /global/cfs/cdirs/desi/spectro/data/20210224/00077930/desi-00077930.fits.fz 
# HDU 1 in /global/cfs/cdirs/desi/spectro/data/20210224/00077930/desi-00077930.fits.fz:
SEEING  =    0.933458000292144 / [arcsec] ETC seeing                            
PMSEEING=                 0.95 / [arcsec] PlateMaker GFAPROC seeing             

@weaverba137
Copy link
Member Author

Ah, I neglected to mention: the only values that are replaced are NaN or empty. So any valid value is better than the existing value.

@araichoor
Copy link
Contributor

thanks for that precision.
but, for this specific case of SEEING_ETC for EXPID=77930: isn t the SEEING_ETC=0 value in exposures-daily-patched-with-jura-20240826.fits wrongly patching a SEEING_ETC=0.933 value in exposures-daily-original-20240826.fits? that looks contradictory with your statement, no? but maybe I m missing something, I ve not followed all the details of the original ticket discussion...

@weaverba137
Copy link
Member Author

That's unexpected. OK, I will take a look at that in detail when I get a chance.

@araichoor
Copy link
Contributor

I think the issue comes from this line:
https://github.com/desihub/specprod-db/blob/5cc344c12b8311ca90e05d7b6ac7215fdcf15d83/py/specprodDB/patch.py#L115

        dst_exposures_patched[column][dst_exposures_index] = src_exposures[column][src_exposures_index]

shouldn t be replaced only the 'masked' indexes here? (which can change for each column)
i.e. something like:

        sel = dst_exposures_patched[column][dst_exposures_index].mask
        dst_index, src_index = dst_exposures_index[sel], src_exposures_index[sel]
        log.info("Patching exposures column %s: patch %d masked rows out of %d.", column, dst_index.size, dst_exposures_index.size) 
        dst_exposures_patched[column][dst_index] = src_exposures[column][src_index]

@araichoor
Copy link
Contributor

Another subtlety here (I am not sure if it matters or not, but I mention in case): it can happen that a column is masked in daily but not in jura -- that s the case for SEEING_ETC.
I don t know if that breaks some logic in the code.

@araichoor
Copy link
Contributor

@weaverba137 : I ended up coding some modifications in patch_exposures() only in this file.

-rw-rw---- 1 raichoor desi 19K Aug 30 00:49 /global/cfs/cdirs/desi/users/raichoor/spectro/daily/patch-exposures-tiles/patch-ar.py

This is not a complete fix, but I share in case it is useful.

I tried to:

  • implement the remark above;
  • properly handle the (TILERA, TILEDEC) and MJD columns:
    • patch (TILERA, TILEDEC) only if it is (0, 0);
    • elaborate the piece of code searching for the raw data to retrieve these columns, if possible; for that, I do not see why one would restrict to some nights, we can just go check each-and-every exposure, no?

If useful, I ve made a run of this script in that same folder; in particular, I ve dumped the prompted output in a log file (patch-ar-20240830.log).

@araichoor
Copy link
Contributor

And an aside-remark:

if you re ok to bring in a desitarget-dependency in this code, these lines (and also in patch_frames()) :
I think that https://github.com/desihub/specprod-db/blob/5cc344c12b8311ca90e05d7b6ac7215fdcf15d83/py/specprodDB/patch.py#L83-L95
could be replaced simply by:

from desitarget.geomask import match
src_exposures_index, dst_exposures_index = match(src_exposures['EXPID'], dst_exposures['EXPID'])

That match() function returns matching indexes for two arrays on a unique key; which is the key for EXPID here.
(actually it is a "pure-numpy" function).

@weaverba137
Copy link
Member Author

@araichoor I think that you are correct that the replacement is happening over all rows instead of the daily rows with invalid values. That somehow got dropped when moving the code from my notebook. However, I won't have time to look at this in detail until next week.

@weaverba137
Copy link
Member Author

@araichoor I've implemented most of your suggestions, or otherwise found similar solutions, so please examine these new files:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-jura-20240904.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-jura-20240904.fits

@araichoor
Copy link
Contributor

thanks!

I ve given a very quick look at the EXPOSURES extension of the exposures*fits files: this first sanity check looks good, ie changed values are nan's in the original file (or zeros for TILERA,TILEDEC,MJD).
though I notice that the TSNR2_* and SKY_*_SPEC columns also are changed (nan => 0) for (presumably) one tile: I m not sure I remember what was the decision here (ie patch with 0, or leave the nan).

maybe it s because of these lines: https://github.com/desihub/specprod-db/blob/0b3168e3d6e364f4781ab8c719391627a449bd52/py/specprodDB/patch.py#L219-L227?

a = Table(fitsio.read("/global/cfs/cdirs/desi/spectro/redux/daily/exposures-daily.fits", "EXPOSURES"))
b = Table(fitsio.read("/global/cfs/cdirs/desi/users/bweaver/exposures-daily-patched-with-jura-20240904.fits", "EXPOSURES"))
a = a[np.in1d(a["EXPID"], b["EXPID"])]
assert np.all(a["EXPID"] == b["EXPID"])

for k in a.colnames:
    sel = a[k] != b[k]
    if sel.sum() > 0:
        # just display the 5 first unique values
        old_vals = ",".join(np.unique(a[k][sel])[:5].astype(str))
        new_vals = ",".join(np.unique(b[k][sel])[:5].astype(str))
        print("{}\t{}\t{}\t{}".format(sel.sum(), k, old_vals, new_vals))

returns

# NDIFF KEY OLD_VALS NEW_VALS
3180	TILERA	0.0	4.0,5.0,23.4,24.027,24.1
3156	TILEDEC	0.0	-30.0,-27.0,-25.0,-23.2,-20.2
3285	MJD	0.0	59198.17546592,59198.18926673,59198.20253263,59198.21581194,59198.2450305
4	SURVEY	unknown	sv1,sv2,sv3
1	EFFTIME_SPEC	nan	0.0
12	EBV	nan	0.0
105	SEEING_ETC	nan	0.0,0.8039699792861938,0.8321769833564758,0.8420889973640442,0.8522260189056396
1	TSNR2_ELG	nan	0.0
1	TSNR2_QSO	nan	0.0
1	TSNR2_LRG	nan	0.0
1	TSNR2_LYA	nan	0.0
1	TSNR2_BGS	nan	0.0
1	TSNR2_GPBDARK	nan	0.0
1	TSNR2_GPBBRIGHT	nan	0.0
1	TSNR2_GPBBACKUP	nan	0.0
1	LRG_EFFTIME_DARK	nan	0.0
1	ELG_EFFTIME_DARK	nan	0.0
1	BGS_EFFTIME_BRIGHT	nan	0.0
1	LYA_EFFTIME_DARK	nan	0.0
1	GPB_EFFTIME_DARK	nan	0.0
1	GPB_EFFTIME_BRIGHT	nan	0.0
1	GPB_EFFTIME_BACKUP	nan	0.0
382	SKY_MAG_G_SPEC	nan	0.0
382	SKY_MAG_R_SPEC	nan	0.0
382	SKY_MAG_Z_SPEC	nan	0.0

@weaverba137
Copy link
Member Author

Thank you, the numbers in your table agree with the numbers logged by the patch script.

@sbailey should comment on this, but yes, we decided that any remaining NaN, even after patching, should be set to zero.

@araichoor
Copy link
Contributor

ok great, if that s expected; let s wait Stephen s confirmation here.

now, if I run the same code snippet on the FRAMES extension of exposures*fits, I do see remaining nan in the patched file:

3150	SEEING_ETC	nan	0.8039699792861938,0.8321769833564758,0.8420889973640442,0.8522260189056396,0.8578180074691772
2	TSNR2_ELG	nan	nan
2	TSNR2_BGS	nan	nan
2	TSNR2_QSO	nan	nan
2	TSNR2_LRG	nan	nan
2	TSNR2_LYA	nan	nan
491514	TSNR2_ALPHA	nan	nan
357	EBV	nan	nan
2	TSNR2_GPBDARK	nan	nan
2	TSNR2_GPBBRIGHT	nan	nan
2	TSNR2_GPBBACKUP	nan	nan

also, I notice that e.g. no MJD column value is changed in the patched file.
that would obviously lead to inconsistency, between the EXPOSURES and FRAMES extensions.
e.g. for EXPID=67710:

for ext in ["EXPOSURES", "FRAMES"]:
    d = Table(fitsio.read("/global/cfs/cdirs/desi/users/bweaver/exposures-daily-patched-with-jura-20240904.fits", ext))
    d = d[d["EXPID"] == 67710]
    print("{}\t{}\t{}".format(ext, len(d), np.unique(d["MJD"]).tolist()))

=>

# EXTENSION NROW UNIQ_MJD_VALS
EXPOSURES	1	[59198.17546592]
FRAMES	30	[0.0]

wouldn t we run the same patching function for both extensions?

@weaverba137
Copy link
Member Author

The patching code still has vestiges of patching only tiles/exposures/frames that would ever be loaded into a database, rather than attempting to patch all possible tiles/exposures/frames. That's what we're seeing here.

I'll take another pass at the frames.

@araichoor
Copy link
Contributor

re-thanks.

if relevant, in the case we end up replacing nan with 0 for the TSNR2_* columns:
if I m correct, this column in patch_frames() will exclude the TSNR2_* columns, no? https://github.com/desihub/specprod-db/blob/0b3168e3d6e364f4781ab8c719391627a449bd52/py/specprodDB/patch.py#L83

@weaverba137
Copy link
Member Author

Again, vestiges...

@weaverba137
Copy link
Member Author

weaverba137 commented Sep 6, 2024

@araichoor, I've applied your suggestions to the FRAMES hdu, so please test with

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-jura-20240906.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-jura-20240906.csv

@weaverba137
Copy link
Member Author

Also note: TSNR2_ALPHA does not exist in jura. It will get filled with zero.

@weaverba137
Copy link
Member Author

@sbailey @akremin @araichoor could you please review these latest files with an eye toward making the final patch and substitution of the files this week:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-jura-20240913.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-jura-20240913.csv

@sbailey
Copy link
Contributor

sbailey commented Sep 19, 2024

@weaverba137 summary of findings:

tiles-daily-patched-with-jura-20240913.csv

  • Good: UPDATED is only new column, and no missing columns
  • Good: Updates to SURVEY and PROGRAM
  • Good: TILEID=80607 corrected TILERA, TILEDEC
  • TILEID=41685 LASTNIGHT=20220101 still has EFFTIME_SPEC,LRG_EFFTIME_DARK,ELG_EFFTIME_DARK,LYA_EFFTIME_DARK,BGS_EFFTIME_BRIGHT=NaN in both daily and patched; in Kibo these are not NaN so patching with Kibo instead of Jura should fix it.
  • TILEIDs 40203, 42262, 80605, 80610, 80679, 80731 still have EFFTIME_GFA=NaN. I have not cross-referenced whether these are the known cases of upstream GFA NaNs, but I thought they would be patched in the tiles file here, since I thought that future runs of desi_tsnr2_afterburner are also intercepting and replacing NaNs.

exposures-daily-patched-with-jura-20240913.fits

  • Good: identical set of column names
  • Good: many corrections to TILERA, TILEDEC, MJD, SURVEY
  • 30 exposures 252511 through 252540 have non-zero values of TRANSPARENCY_GFA, SEEING_GFA, and other *_GFA columns in daily, but have unexpctedly changed to 0.0 in the patched file.

For both tiles and exposures

  • daily has 68 additional tiles observed since 20240911, so we'll need to re-patch before substituting the daily files.
  • Will need to update both *.fits and *.csv versions in daily simultaneously

@weaverba137
Copy link
Member Author

TILEID=41685 LASTNIGHT=20220101 still has EFFTIME_SPEC,LRG_EFFTIME_DARK,ELG_EFFTIME_DARK,LYA_EFFTIME_DARK,BGS_EFFTIME_BRIGHT=NaN in both daily and patched; in Kibo these are not NaN so patching with Kibo instead of Jura should fix it.

That's unexpected, I'll have to do more debugging.

TILEIDs 40203, 42262, 80605, 80610, 80679, 80731 still have EFFTIME_GFA=NaN. I have not cross-referenced whether these are the known cases of upstream GFA NaNs, but I thought they would be patched in the tiles file here, since I thought that future runs of desi_tsnr2_afterburner are also intercepting and replacing NaNs.

Maybe they are NaN in Jura. Again more debugging.

daily has 68 additional tiles observed since 20240911, so we'll need to re-patch before substituting the daily files.

Acknowledged.

@weaverba137
Copy link
Member Author

30 exposures 252511 through 252540 have non-zero values of TRANSPARENCY_GFA, SEEING_GFA, and other *_GFA columns in daily, but have unexpctedly changed to 0.0 in the patched file.

Also needs more debugging. Didn't want to forget that one.

@weaverba137
Copy link
Member Author

@sbailey, here's a quick analysis.

TILEID=41685 LASTNIGHT=20220101 still has EFFTIME_SPEC,LRG_EFFTIME_DARK,ELG_EFFTIME_DARK,LYA_EFFTIME_DARK,BGS_EFFTIME_BRIGHT=NaN in both daily and patched; in Kibo these are not NaN so patching with Kibo instead of Jura should fix it.

This one was easy, as it turns out. Tile 41685 is not in jura. So patching with kibo is the way to go here.

TILEIDs 40203, 42262, 80605, 80610, 80679, 80731 still have EFFTIME_GFA=NaN. I have not cross-referenced whether these are the known cases of upstream GFA NaNs, but I thought they would be patched in the tiles file here, since I thought that future runs of desi_tsnr2_afterburner are also intercepting and replacing NaNs.

For reasons I can no longer recall, I was not actually patching that column. I must have thought there were no NaN in that column. It may be a technical issue. Normally I would detect columns that contain NaN because the Table column has a .mask attribute. But not in this case, which is weird! This will be easy to fix though.

30 exposures 252511 through 252540 have non-zero values of TRANSPARENCY_GFA, SEEING_GFA, and other *_GFA columns in daily, but have unexpctedly changed to 0.0 in the patched file.

I'm still investigating this one, but I'm pretty sure that those exposures are not in jura or kibo. Can you confirm?

@weaverba137
Copy link
Member Author

@sbailey, I've fixed the two issues with the tiles file, and the issue with the exposures file seems to have gone away. Please take a look:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-kibo-20240920.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-kibo-20240920.csv

@sbailey
Copy link
Contributor

sbailey commented Oct 1, 2024

@weaver @weaverba137 , thanks, it looks like the previously identified issues have all been fixed. One more:

It looks like the SURVEY, PROGRAM patching was done for tiles but not exposures, leading to inconsistencies in these columns for the same TILEID. Admittedly, daily also has inconsistencies on these columns, but let's get that fixed now too while we're patching these files. For the record, production reruns like Kibo do not have inconsistencies on these columns for the tiles that they cover.

Otherwise this looks good to proceed with making an updated patch and putting it into place, coordinated with @akremin .

@weaverba137
Copy link
Member Author

@sbailey I will have to run some tests to ensure that the value of FAFLAVOR is consistent among all the tables (i.e. I should also check the frames table). Then I can just run faflavor2program() as needed. I'll do a similar consistency check for SURVEY.

PS, I'm @weaverba137. I know, it's weird.

@weaverba137
Copy link
Member Author

And good thing I checked, because I found inconsistencies between exposures and frames in both SURVEY and FAFLAVOR. Apparently the frames table does not have PROGRAM.

@weaverba137
Copy link
Member Author

@sbailey the most recent SURVEY, PROGRAM issue is now fixed, please test:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-kibo-20241001.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-kibo-20241001.csv

@sbailey
Copy link
Contributor

sbailey commented Oct 2, 2024

With the updates 20241001 files, the updates to SURVEY, PROGRAM, FAPROGRAM, and FAFLAVOR look good, but GOALTYPE might be a mix, e.g.

  • tiles 82269-82281 went from GOALTYPE=other to GOALTYPE=unknown
  • tiles 83024 and 83004 went from GOALTYPE=bright/dark to other. That might be right, but @araichoor please check.

there are still 1050 cmx and sv1 exposures entries with FAPRGRM, FAFLAVOR, or GOALTYPE = 'unknown'. @araichoor could you help identify what those should be?

t2 = Table.read('/global/cfs/cdirs/desi/users/bweaver/exposures-daily-patched-with-kibo-20241001.fits')
ii = (t2['FAPRGRM']=='unknown') | (t2['FAFLAVOR']=='unknown') | (t2['GOALTYPE'] == 'unknown')
t2['TILEID', 'NIGHT', 'EXPID', 'SURVEY', 'PROGRAM', 'FAPRGRM', 'FAFLAVOR', 'GOALTYPE'][ii]

<Table length=1050>
TILEID  NIGHT   EXPID  SURVEY PROGRAM FAPRGRM  FAFLAVOR GOALTYPE
int32   int32   int32  bytes7  bytes6 bytes19  bytes19   bytes7 
------ -------- ------ ------ ------- -------- -------- --------
 63075 20200219  50986    cmx   other  unknown  unknown  unknown
 70004 20200219  50988    cmx   other  unknown  unknown  unknown
   ...      ...    ...    ...     ...      ...      ...      ...
 82280 20230806 189020    cmx   other dithprec dithprec  unknown
 82281 20230806 189021    cmx   other dithprec dithprec  unknown

Especially for GOALTYPE the value may not matter, but something like 'n/a' or 'na' (not applicable) or "none" could be better than "unknown".

@weaverba137
Copy link
Member Author

tiles 82269-82281 went from GOALTYPE=other to GOALTYPE=unknown

Is that in the tiles file or the corresponding exposures in the exposures file?

@weaverba137
Copy link
Member Author

PS, with the latest update, the goal was to enforce consistency, which is my interpretation of your previous request. This is done by first performing all the patching and then "back-patching" the exposures and frames to match the tiles file. The "back-patching" is done on these columns: ('SURVEY', 'PROGRAM', 'FAPRGRM', 'FAFLAVOR', 'GOALTYPE').

@sbailey
Copy link
Contributor

sbailey commented Oct 2, 2024

My latest statements were about the exposures file, and I agree that they are now consistent with the tiles file. What I was raising with the "unknown" was a newly discovered issue that I hope we can fix (or otherwise decide that that some remaining unknowns are ok).

@weaverba137
Copy link
Member Author

@akremin @araichoor please see the comments above about GOALTYPE.

In the hopes that this will be wrapped up relatively soon, @sbailey & I had a short offline discussion about what daily tiles we want to load once the patching is done. We had previously proposed only loading tiles that have (in the tiles file):

LASTNIGHT >= 20201214 & EFFTIME_SPEC > 0

where 20201214 is the earliest night in jura/kibo.

However, there are tiles which satisfy this condition that have the following anomaly: when you examine the corresponding exposures for the tile, they have sum(EFFTIME_SPEC) == 0, in contradiction to the tiles file.

To identify these, I used the following snippet:

tiles = Table.read('tiles-daily-patched...csv'). # Adjust path as needed
exposures = Table.read('exposures-daily-patched...fits', hdu='EXPOSURES')
candidate_tiles = tiles[(tiles['LASTNIGHT'] >= 20201214) & (tiles['EFFTIME_SPEC'] > 0)]
for new_tile in candidate_tiles:
    row_index = np.where((exposures['TILEID'] == new_tile['TILEID']) & (exposures['EFFTIME_SPEC'] > 0))[0]
    if len(row_index) == 0:
        print("ERROR: No valid exposures found for tile {0:d}, even though EFFTIME_SPEC == {1:f}!".format(new_tile['TILEID'], new_tile['EFFTIME_SPEC']))
        bad_index = np.where((exposures['TILEID'] == new_tile['TILEID']))[0]
        print(exposures[['TILEID', 'EXPID', 'NIGHT', 'SURVEY', 'PROGRAM', 'FAPRGRM', 'FAFLAVOR', 'EFFTIME_SPEC']][bad_index])

@sbailey & I agree on the following:

One option would be to:

  1. Set EFFTIME_SPEC = 0 in tiles-daily for tiles 1825 & 21273.
  2. Patch the exposures associated with the other tiles with values from kibo, assuming they appear in kibo, otherwise leave them alone.
  3. If after patching in step 2, the tile still has no exposures with EFFTIME_SPEC > 0, set EFFTIME_SPEC = 0 in tiles-daily.

In any case, we should at least try to understand what happened to create the anomaly and finalize a patching decision based on that.

@weaverba137
Copy link
Member Author

In fact, many, but not all, of the sv2 or sv3 backup tiles have exposures with non-zero EFFTIME_SPEC in kibo, but there are numerical differences between sum(EFFTIME_SPEC) over exposures and EFFTIME_SPEC reported in the tiles file. So we would probably want to patch the value in the tiles file to match the sum over the non-zero exposures.

@araichoor
Copy link
Contributor

@sbailey, about:
"there are still 1050 cmx and sv1 exposures entries with FAPRGRM, FAFLAVOR, or GOALTYPE = 'unknown'. @araichoor could you help identify what those should be?"

if I m correct, the goaltype keyword in the fiberassign header has been introduced through this PR merged on Mar. 22, 2021: desihub/fiberassign#319
we introduced that once we had an ETC.

so I guess there s no "correct" value here, it s just what do fill those up with.
I see that those 1050 exposures have PROGRAM="other", so maybe just set GOALTYPE="other"?

if informative, here s the FAFLAVOR for those 150 exposures:

# NIGHTMIN NIGHTMAX NEXP FAFLAVOR
20210916	20210919	26	cmxposmapping
20210219	20210423	51	dithfocus
20201220	20201220	12	dithlost
20201214	20230806	405	dithprec
20210110	20210115	6	sv1m31      # <- before Mar. 22, 2021
20210130	20210130	4	sv1rosette  # <- before Mar. 22, 2021
20200219	20200315	546	unknown

@araichoor
Copy link
Contributor

re- "tiles 83024 and 83004 went from GOALTYPE=bright/dark to other. That might be right, but @araichoor please check."
=>
mmhh that does not sound right to me.
those two calibration tiles are "recent" tiles, with everything properly set.

btw, I notice that those "other" values are partially there in the daily, there in jura, but not in kibo, i.e. it is a bit messy:

in daily:

>>> d = Table.read("/global/cfs/cdirs/desi/spectro/redux/daily/exposures-daily.csv")
>>> d[np.in1d(d["TILEID"],[83004,83024])]["SURVEY", "PROGRAM", "GOALTYPE", "GOALTIME", "NIGHT", "EFFTIME_SPEC"]
<Table length=6>
 SURVEY PROGRAM GOALTYPE GOALTIME  NIGHT   EFFTIME_SPEC
  str7    str6    str7   float64   int64     float64   
------- ------- -------- -------- -------- ------------
special    dark    other   1000.0 20231114       1081.5
special  bright    other    180.0 20231204         61.3
special  bright    other    180.0 20231206        115.7
special  bright   bright    180.0 20240922        156.0
special    dark     dark   1000.0 20240926        366.8
special    dark     dark   1000.0 20240927         67.6

in jura:

>>> d = Table.read("/global/cfs/cdirs/desi/spectro/redux/jura/exposures-jura.csv")
>>> d[np.in1d(d["TILEID"],[83004,83024])]["SURVEY", "PROGRAM", "GOALTYPE", "GOALTIME", "NIGHT", "EFFTIME_SPEC"]
<Table length=3>
 SURVEY PROGRAM GOALTYPE GOALTIME  NIGHT      EFFTIME_SPEC   
  str7    str6    str6   float64   int64        float64      
------- ------- -------- -------- -------- ------------------
special   other    other   1000.0 20231114  1071.662841796875
special   other    other    180.0 20231204  61.35043716430664
special   other    other    180.0 20231206 116.04764556884766

in kibo:

>>> d = Table.read("/global/cfs/cdirs/desi/spectro/redux/kibo/exposures-kibo.csv")
>>> d[np.in1d(d["TILEID"],[83004,83024])]["SURVEY", "PROGRAM", "GOALTYPE", "GOALTIME", "NIGHT", "EFFTIME_SPEC"]
<Table length=3>
 SURVEY PROGRAM GOALTYPE GOALTIME  NIGHT      EFFTIME_SPEC   
  str7    str6    str6   float64   int64        float64      
------- ------- -------- -------- -------- ------------------
special    dark     dark   1000.0 20231114   1072.22900390625
special  bright   bright    180.0 20231204  66.21401977539062
special  bright   bright    180.0 20231206 122.33030700683594

@sbailey
Copy link
Contributor

sbailey commented Oct 3, 2024

Summarizing the outstanding issues with action items:

  • tile 83004 should have GOALTYPE="dark" (like Kibo) instead of "other"
  • tile 83024 should have GOALTYPE="bright" (like Kibo) instead of "other"
  • All GOALTYPE='unknown' entries come from PROGRAM='other'; change these to GOALTYPE='other'
  • The FAPRGRM=FAFLAVOR='unknown' cases are from early cmx tiles not included in Kibo; leave them as-is

Outstanding issue to understand:

  • Some tiles have exposures sum(EFFTIME_SPEC)=0 but tiles EFFTIME_SPEC>0. I'll discuss with @akremin.

@weaverba137
Copy link
Member Author

Thanx @sbailey, I'll get started on those as soon as I can.

@sbailey
Copy link
Contributor

sbailey commented Oct 3, 2024

Summarizing analog conversation with @akremin about the exposures vs. tiles EFFTIME_SPEC inconsistencies:

  • we don't know how we got into this situation, and it is concerning, but we don't want it to be a major blocking factor here
  • for the two retired main survey tiles (1825, 21273) it doesn't matter whether they are in the daily DB or not, so do whatever patching is easiest (special case tiles EFFTIME_SPEC=0, or patch exposures EFFTIME_SPEC like the next bullet, either is fine)
    • UPDATE: upon further investigation, Anthony recommends EFFTIME_SPEC=0 for both of these
  • for other tiles, patch exposures EFFTIME_SPEC from Kibo if it has an entry, otherwise leave it as-is
  • Do not enforce exposures sum(EFFTIME_SPEC) = tiles EFFTIME_SPEC, since that could have downstream impacts on tiles-specstatus.ecsv and survey operations if a tile re-crosses the "done" boundary. Leave that for another day.

@weaverba137
Copy link
Member Author

@sbailey, one point of clarification: the above doesn't cover the situation where all exposures have EFFTIME_SPEC = 0 even after patching with kibo. Should I set EFFTIME_SPEC = 0 in the tiles file in that case?

@sbailey
Copy link
Contributor

sbailey commented Oct 3, 2024

@sbailey, one point of clarification: the above doesn't cover the situation where all exposures have EFFTIME_SPEC = 0 even after patching with kibo. Should I set EFFTIME_SPEC = 0 in the tiles file in that case?

I think we should leave those as EFFTIME_SPEC=0 in exposures and EFFTIME_SPEC>0 in tiles, i.e. leave them as glaringly inconsistent pending further study. If/when we do update them again, the new tiles UPDATED column will help trigger a refresh of the DB.

And after further inspection, @akremin recommends that "For 21273 and 1825 I think we want both to be EFFTIME=0"

@weaverba137
Copy link
Member Author

In that case I suggest we proceed as follows: for tiles in this situation, they will appear in the daily.tile table, but not in any other table. If and when any exposures are shown to be valid, then the UPDATED column should be updated (as you say), then a tile-based load should trigger loading exposures, frames, photometry, targeting, redshifts, fiberassign.

@weaverba137
Copy link
Member Author

So it looks like my concern may be moot. After excluding tiles 1825, 21273, all other tiles can be at least partially patched with kibo, which means that there should be zero other tiles (that we would attempt to load anyway) that have sum(EFFTIME_SPEC) = 0 in exposures and EFFTIME_SPEC >0 in tiles.

@weaverba137
Copy link
Member Author

@sbailey, @araichoor, @akremin, I've implemented the latest round of suggested patches:

  • ${DESI_ROOT}/users/bweaver/exposures-daily-patched-with-kibo-20241004.fits
  • ${DESI_ROOT}/users/bweaver/tiles-daily-patched-with-kibo-20241004.csv

@sbailey
Copy link
Contributor

sbailey commented Oct 4, 2024

@weaverba137 thanks. This looks good to me. No further requests.

For the record: there are still discrepancies where exposures sum(EFFTIME_SPEC) != tiles EFFTIME_SPEC which we are punting to study later(never?). I have checked and agree that we don't have any more cases where tiles EFFTIME_SPEC>0 but exposures sum(EFFTIME_SPEC)==0. They are always at least semi-close.

I will leave it to @weaverba137 and @akremin to coordinate the final integration with dailyops: patch+install these into the actual daily prod, plus merge #2773 and update at NERSC so that the next update will include the UPDATED timestamp column. I suggest that you do this on Monday so that if there are any unexpected side-effects, you are debugging them during the work week not during the weekend.

@weaverba137
Copy link
Member Author

For the record: there are still discrepancies where exposures sum(EFFTIME_SPEC) != tiles EFFTIME_SPEC which we are punting to study later(never?). I have checked and agree that we don't have any more cases where tiles EFFTIME_SPEC>0 but exposures sum(EFFTIME_SPEC)==0. They are always at least semi-close.

Yes, that agrees with my findings. If there are additional cases with this situation, they are prior to 20201214.

I will leave it to @weaverba137 and @akremin to coordinate the final integration with dailyops:...

Monday would be fine with me.

@weaverba137
Copy link
Member Author

PS, and to be clear, we would re-patch the files to ensure that any tiles that came in over the weekend would be included.

@akremin
Copy link
Member

akremin commented Oct 4, 2024

Perfect, thank you Ben. Yes, on Monday we would want to do the following:

  1. Create new patched exposures and tiles files based on the most up to date files in daily (ideally after Abhijeet has signed off on it for the day at around 10am Pacific).
  2. Merge the new code that updates the UPDATED column.
  3. Update git repo at NERSC
  4. Run a test night to make sure everything works
  5. Hopefully close this ticket

@weaverba137
Copy link
Member Author

Closed by #2373 and desihub/specprod-db#16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants