-
-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SDSS-V SpectrumList format ambiguity, mwmVisit BOSS load fail #1185
Conversation
- no longer checks for "date_obs"; calculate that yourself - also adds "sdss_id" to metadata now
…ases - added new test cases for BOSS-only mwmVisit and mwmStar files - added new checks to SpectrumList mwmVisit/mwmStar test to check verified filetype is correct - forced override on default SpectrumList loaders -- now SpectrumList is no longer ambiguous and doesn't require a format specification - relevant areas in tests are updated accordingly - added print warnings to when HDU is not specified on Spectrum1D loaders for files with multiple spectra. - ensured tests now remove tempfiles with os.remove - arguably, this could work better with tmpfile, but i don't know how tests are deployed on the server-side
Please add a change log. @weaverba137 , would you be interested to review this or ping someone who would be? Thanks! |
- three points outlining changes listed in PR as per astropy#1185
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1185 +/- ##
==========================================
+ Coverage 86.89% 86.95% +0.05%
==========================================
Files 63 63
Lines 4564 4568 +4
==========================================
+ Hits 3966 3972 +6
+ Misses 598 596 -2 ☔ View full report in Codecov by Sentry. |
"SDSS-V spec", | ||
identifier=spec_sdss5_identify, | ||
dtype=SpectrumList, | ||
force=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the multi
removed, what is the difference between this data loader and the above data loader at line 318? Same with the mwm
data loaders?
With this removal, what would be the correct format
to specify to load all data extensions? Is it now the default? Jdaviz only supports using format
during data load to provide hints to the type of filepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal with this was to simplify the interface. The difference previously was that multi
loaded every valid HDU extension, whereas the single SpectrumList
ones would load just a specified HDU. Since it's a SpectrumList
, I felt the default loader for that type should just load every extension.
Across both the Spectrum1D
and SpectrumList
loaders there should be a single format (SDSS-V [DATATYPE]
), which should be detected automagically and no longer needs to be manually specified on SpectrumList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue might be mostly in Specviz. I'd like to retain the workflow of passing Specviz a single file and having it load all spectra from all extensions. Specviz uses the following https://github.com/spacetelescope/jdaviz/blob/main/jdaviz/configs/specviz/plugins/parsers.py#L78-L87 to load a specutils object from a file, which first attempts to load a Spectrum1D with a format option, and on fail, tries to load the SpectrumList with a format option.
With the multi
option, I can explicitly tell Specviz to load using the SpectrumList loader, which will break will this change. Specviz only allows me to pass in a format
keyword to specutils. Previously using SDSS-V mwm multi
on the mwmStar
files allowed me to load all spectra, since it triggered line 87. With this, it only loads the first spectrum from the first extension found.
In my mind, the easiest fix would be to reintroduce the multi
here, but maybe it's more appropriate to fix Specviz instead. @rosteen since you're a maintainer/dev of both specutils
and jdaviz
, do you have thoughts on the best approach, and/or where the fix should go?
I am testing with the mwmStar and mwmVisit files for id 61731453, which has spectra in both the BOSS and APOGEE extensions. the mwmStar file has 1 spectrum per extension. The mwmVisit file has 1 spectrum in the BOSS extension, and 3 spectra in the APOGEE extension, loaded as a SpectrumList of 2 Spectrum1D objects, with flux.shapes
of ((4648,), (3, 8575))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to catch up on this this afternoon, I might have thoughts then.
@@ -462,14 +467,17 @@ def load_sdss_mwm_1d(file_obj, hdu: Optional[int] = None, **kwargs): | |||
for i in range(len(hdulist)): | |||
if hdulist[i].header.get("DATASUM") != "0": | |||
hdu = i | |||
print('HDU not specified. Loading spectrum at (HDU{})'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a warning or is specifying an HDU required? For production, we use Jdaviz to load the file, and want by default all data extensions loaded, without having to specify an HDU, which I think will be the most common use case by users as well. Does this result in many constant warning messages printed to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we want to handle this case generally. mwm
files contain a max of 4 extensions which can each have N spectra. These shouldn't really be loaded into a single Spectrum1D
object because APOGEE/BOSS are at different wavelength ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that the APOGEE/BOSS extensions should not be loaded into a single Spectrum1D. They should be loaded into a SpectrumList. I think the warning is fine. It's useful for anyone using specutils
directly, and will be buried within Specviz and Zora.
I think we might need the
Before I could load all extensions with I can load a
with
|
Co-authored-by: Brian Cherinka <[email protected]>
- all loaders now only load for a single datatype, avoiding prior knowledge of SDSS datatypes - updated to only load as SpectrumList - updated to load all visits in mwmVisit files as individual Spectrum1D objects in the SpectrumList - relevant tests removed - relevant import __all__ adjusted
…ss-loaders into mwmvisit-boss-fix
- describes changes shown previously
This is again the same issue we run into previously where jdaviz can't load
The helper for For both of these, I've been debating how to keep I/O simple without waiting for any jdaviz fixes, so I've done this:
This is how the DESI default loaders handle I/O, and this avoids any confusion on a user (student's) part who is unfamiliar with the SDSS-V datatype structure. I'm not sure if this is exactly what we want for edit: with this change the
|
This is exactly the problem that was occurring, and why we need an explicitly different format for a list of spectra, until we can fix the Specviz data loader.
This is NOT true. The majority of BHM spectra are single spectra. Only the spec-full files contain multiple spectra for individual exposures, and most users do not use these files.
IMO, we should avoid doing this, and I'm hesitant to remove it. I agree that it'd be nice to have a single format for each SDSS data product. I think it's ok to have users specify
For extensions with many spectra on the same wavelength grid, a single It's not ideal, but if we also add back in the explicit
While indeed this does "fix" Specviz data loading, it does so at the expense of usability within the |
return [ | ||
Spectrum1D( | ||
spectral_axis=spectral_axis, | ||
flux=flux[i], | ||
uncertainty=e_flux[i], | ||
mask=mask[i], | ||
meta=metadicts[i], | ||
) for i in range(flux.shape[0]) | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not already in place, we will need to do a similar thing for the SDSS-V spec
loader, but only for any of the spec-full files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't spec-full
visits dispersed across the HDUs past the spall
and zall
HDU's? Or am I mistaken about the format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's right. They're appended at the end as
5 MJD_EXP_60052-00 1 BinTableHDU 273 4648R x 8C [E, E, E, J, J, E, E, E]
6 MJD_EXP_60052-01 1 BinTableHDU 271 4648R x 8C [E, E, E, J, J, E, E, E]
7 MJD_EXP_60052-02 1 BinTableHDU 273 4648R x 8C [E, E, E, J, J, E, E, E]
8 MJD_EXP_60052-03 1 BinTableHDU 273 4648R x 8C [E, E, E, J, J, E, E, E]
...
So, yeah, a change isn't needed here since they are 1d spectra in separate extensions.
revert back to 4bee136
I did forget about the individual spectra. My mistake. I agree with you on your points. I didn't really consider how I'd like to keep the interface consistent -- if we output 1D flux here, we output it everywhere. So users should also have to specify the specific
It does get a lil confusing but it keeps our interface supported on all datatypes incl. the man one.
I would really like to keep this change since it fixes a Is it possible in the case of |
- readded print -> warnings conversion - can specify the visit to load on mwmVisit load. - added relevant tests for the new mwmVisit case
I agree with @havok2063's points here - we don't want to sacrifice If my understanding is correct, at this point the main sticking point seems to be Specviz not handling 2D flux arrays. I opened a PR in Jdaviz with a quick patch to do that, it simply unpacks the 2D array into separate |
That's fine. Although Does adding the
Then we'll also need to change the specviz loading logic at https://github.com/spacetelescope/jdaviz/blob/main/jdaviz/configs/specviz/plugins/parsers.py#L78-L87
Yeah it is, but I'd like to avoid that. I'd prefer to pass the file and have |
Thanks! I will test this out. If we use your PR, then would we likely need to rollback the equivalent changes here? The other issue is the logic of the Specviz data loader when the input is a filepath, at https://github.com/spacetelescope/jdaviz/blob/main/jdaviz/configs/specviz/plugins/parsers.py#L78-L87. It always tries Spectrum1D first and only on fail, does it try SpectrumList. Previously we had loader formats I'm not sure the right solution here. Is there a preference in
I don't have a strong opinion on the metadata. I'd be fine copying the full dict. I'm not aware of any visit-specific metadata but @rileythai might know. @rileythai what are you currently copying over for metadata?
Yeah our use case has the nd array coming from an extension in a file. |
Hm, how about adding a |
That works for me! |
Great, I implemented it in that PR as |
…ss-loaders into mwmvisit-boss-fix
- no longer unpacks nD spectrum in all cases for mwm - will still unpack flux from 2D to 1D in the event its single visit or coadd
I'm copying over lists of some visit-specific stuff that isn't in the header (SNR, MJD, telescope) that could let users figure out which visits could be bad and where they come from, etc. As of now when imported and copied they'll just become a list of values for each of the imported and unpacked spectra. We could make the 2D-to-1D converter split list-like objects with the same length as
@rosteen check my PR for your PR, which has some fixes for the edge case and simplifies the
While this works for the CLI case, the SDSS-V data loaded through the
There's a minor |
…ss-loaders into mwmvisit-boss-fix
- fixed flux array length check so that the 2D mwm visits are checked properly
- removed HDU is empty warning on mwm SpecList, not really worth a warning - moved test cases which are designed to throw exceptions into their own test function
This PR should be good to go now @rosteen . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good now, but I'm going to wait until I merge spacetelescope/jdaviz#3229 to merge it.
Just merged the downstream PR, I have one more bugfix I'd like to get in the next |
This PR addresses #1182 , and an issue mentioned in #1107 about
SpectrumList
loader ambiguity.Changes are:
mwmVisit
files now correctly load BOSS-only files.mwmVisit
default loader no longer loads metadata about the "date observed" -- only the MJD of the observation (which is enough)SpectrumList
.SpectrumList
loaders generated from theSpectrum1D
default loaders.SpectrumList
, they no longer exist and it will fail.Spectrum1D
loader for SDSS-V files with multiple spectra and specific HDU is not specified.sdss_id
formeta
field.tests/test_sdss_v.py
have been adjustedmwm
filetype ismwmVisit
ormwmStar
os.remove
)mwmVisit
andmwmStar
files.