Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse of SRCTYPE keyword leads to incorrect ASDF/FITS synchronization #381

Open
braingram opened this issue Jan 28, 2025 · 3 comments
Open

Comments

@braingram
Copy link
Collaborator

Both meta.target.source_type:


and (for example) exposures[<index>].source_type:

use the SRCTYPE keyword in the SCI extension. This leads to confusion and desynchronization between the FITS and ASDF portions of the datamodel files.

For example, if we construct a minimal MultiExposureModel:

>> import stdatamodels.jwst.datamodels as dm
>> m = dm.MultiExposureModel()
>> m.meta.target.source_type = 'POINT'
>> for i in range(2):
>>     m.exposures.append({})
>>     m.exposures[i].source_type = 'EXTENDED'
>> m.meta.target.source_type
'POINT'
>> m.exposures[0].source_type
'EXTENDED'
>> m.exposures[1].source_type
'EXTENDED'
>> m.save("foo.fits")

We have a datamodel where:

  • meta.target.source_type contains POINT (which should map to SCI/SRCTYPE)
  • exposures[0].source_type contains EXTENDED (which also maps to SCI/SRCTYPE)
  • exposures[1].source_type contains EXTENDED (which maps to a second SCI extension, so no conflict here)

The saved file contains

>> fitsheader -k SRCTYPE
# HDU 0 in foo.fits:

# HDU 1 in foo.fits:
SRCTYPE = 'EXTENDED'           / Source type used for calibration

# HDU 2 in foo.fits:
SRCTYPE = 'EXTENDED'           / Source type used for calibration

# HDU 3 in foo.fits:

Note that POINT is not listed. However if we open the file with dm.open:

>> m = dm.open("foo.fits")
>> m.meta.target.source_type
'POINT'

This is due to the fits_hash calculated during reading matching the stored fits_hash and stdatamodels skipping the fits update code which actually loads values from the FITS headers and synchronizes them into the datamodel tree as seen in:

>> m = dm.open("foo.fits", skip_fits_update=False)
>> m.meta.target.source_type
'EXTENDED'

Here's a "real world" file with the issue:
https://bytesalad.stsci.edu:443/artifactory/jwst-pipeline/dev/truth/test_nirspec_fs_spec3/jw01309-o022_s000000025_nirspec_f290lp-g395h-s200b1-allslits_cal.fits
where all SRCTYPE keywords are EXTENDED yet opening the model results in:

>> m = dm.open("jw01309-o022_s000000025_nirspec_f290lp-g395h-s200b1-allslits_cal.fits")
>> m.meta.target.source_type
'POINT'
>> m = dm.open("jw01309-o022_s000000025_nirspec_f290lp-g395h-s200b1-allslits_cal.fits", skip_fits_update=False)
>> m.meta.target.source_type
'EXTENDED'
@melanieclarke
Copy link
Contributor

Even worse for the MultiExposureModel case, there is a copy of meta.target.source_type inside each exposure that is not touched when skip_fits_update=False:

>>> m = dm.open("jw01309-o022_s000000025_nirspec_f290lp-g395h-s200b1-allslits_cal.fits")
>>> m.meta.target.source_type
'POINT'
>>> m.exposures[0].meta.target.source_type
'POINT'
>>> m.exposures[0].source_type
'EXTENDED'
>>> m = dm.open("jw01309-o022_s000000025_nirspec_f290lp-g395h-s200b1-allslits_cal.fits", skip_fits_update=False)
>>> m.meta.target.source_type
'EXTENDED'
>>> m.exposures[0].meta.target.source_type
'POINT'
>>> m.exposures[0].source_type
'EXTENDED'

The reason this happens for multi-slit NIRSpec FS data is that the meta.target.source_type value is copied from the user intent for the exposure, recorded in meta.target.source_type_apt. For fixed slit observations, this is the source type for the primary target. If there are other slits present, they always get default type EXTENDED. Since the primary target is not necessarily the first SCI extension in MultiSlitModels, or included at all in MultiExposureModels separated by target, this discrepancy results.

For NIRSpec MOS exposures, the top-level target source type is cleared out in the srctype step: it is set to None, so that only the slit-level source type appears. We could do the same for FS, so that at least we don’t get this sort of confusing reference to POINT in an EXTENDED target product.

That doesn’t fix the syncing issue, though. For a similar MOS product at:
https://bytesalad.stsci.edu/ui/repos/tree/General/jwst-pipeline/dev/truth/test_nirspec_mos_spec3/jw01345-o066_s000004385_nirspec_f170lp-g235m_cal.fits:

>>> m = dm.open("jw01345-o066_s000004385_nirspec_f170lp-g235m_cal.fits")
>>> m.meta.target.source_type
>>> m.exposures[0].meta.target.source_type
>>> m.exposures[0].source_type
'POINT'
>>> m = dm.open("jw01345-o066_s000004385_nirspec_f170lp-g235m_cal.fits", skip_fits_update=False)
>>> m.meta.target.source_type
'POINT'
>>> m.exposures[0].meta.target.source_type
>>> m.exposures[0].source_type
'POINT'

The fits_update fills in the top level target from the first SCI extension for the top-level meta (but not the exposure-level meta). That’s probably okay for this MultiExposureModel that contains only one target but would be undesirable for a MOS MultiSlitModel, where the slits have different targets, appear in any order, and do not have any kind of top-level intended source type.

I think the simplest and most preferable solution from the datamodels perspective would be to either move meta.target.source_type to the primary header, since it’s only ever relevant if there’s just one source type for the whole product, or else to give them separate FITS keyword names. Both of these options would likely be very disruptive for users: some products would have SRCTYPE in the SCI extension like they are used to, some would have it in the primary header or else under a different name.

From the pipeline processing and user perspective, the existing state is a non-issue. During pipeline processing, the slit value for source_type is always checked before the meta.target.source_type, so it does not matter what is stored in the top-level value. For FITS users, the SCI value set from the slit metadata is the correct value for the target and it does not matter what value the datamodel has in meta.target.source_type. For datamodels users of multislit products, the relevant value is the slit source type and it doesn’t matter which value is in meta.target.source_type.

TL;DR: I propose we don’t fix this. The problem is likely never to be noticed by users, but the “solution” definitely would.

@emolter
Copy link
Contributor

emolter commented Feb 6, 2025

@braingram Melanie and I discussed this and plan to close it as wontfix because of how disruptive the potential fixes are likely to be c.f. the benefit, let me know if you have any concerns with that.

@braingram
Copy link
Collaborator Author

Thanks for the update but I don't think we should close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants