Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NDAstroData.unit #140

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Use NDAstroData.unit #140

wants to merge 20 commits into from

Conversation

saimn
Copy link
Contributor

@saimn saimn commented Mar 15, 2021

This was suggested by @chris-simpson in Trac#828, cc @KathleenLabrie as well.

This PR allows to read the unit from BUNIT and set it to NDAstroData.unit, and the opposite when writing files.
However there is a problem with this. When a NDData object has a unit, then it uses this unit in computations (using Astropy Quantity objects). Which means that all operands should have a unit. Adding or subtracting a scalar or a Numpy array without a unit raises a UnitConversionError.

There are I think 3 solutions:

  • forget about this change ;)
  • update the code everywhere to use units ...
  • or we could probably customize the arithmetic code in NDAstroData so we don't require the operand to have a unit. But then we loose the benefits of having units, so not sure about this.

Another thing to mention here is that currently there is a small performance overhead when doing arithmetic with NDData, because it uses self.data * self.unit to attach the unit which makes a copy of the array. This could be fixed by changing the code in Astropy to use self.data << self.unit which does not make a copy: astropy/astropy#11107

@chris-simpson
Copy link
Contributor

  1. Will we have data without units after this is merged, given that units are added when the AD is created? The addition/subtraction operations we do are things like dark or bias removal, and those will now have units, so it's not clear your "problem" is actually a problem.

  2. Is there a performance difference between these (if gain is a scalar):

ext.multiply(gain * u.electron / u.adu)

and

ext.multiply(gain)
ext.unit = u.electron

@saimn
Copy link
Contributor Author

saimn commented Mar 15, 2021

  1. You mean the .data attribute? It will still be a plain Numpy array, so indeed the problem is maybe not that worse. All the places where we use ext.data would not be impacted, the problem is only when using .add() or .subtract(). As I just pushed this branch we will have a better view once Jenkins has run.

  2. If ext has a unit it will be the same for both since ext.multiply(gain) will do the multiplication with gain * u.dimensionless_unscaled and self.data * self.unit which currently does a copy.

@saimn
Copy link
Contributor Author

saimn commented Mar 16, 2021

Tests are passing after setting the unit to ADU in _append_nddata and with GeminiDRSoftware/AstroFaker#9. For .append I'm not sure what's the best way: we could add a unit argument that would default to ADU and would be set to the NDData object if it doesn't have a unit ?

@chris-simpson
Copy link
Contributor

Yes, I think just like we assume that data in a FITS file are in ADU if there's no BUNIT, then we should assume they're in ADU when created from an NDData with no unit.

@saimn
Copy link
Contributor Author

saimn commented Mar 17, 2021

Ok, it works and I replaced various places where BUNIT was parsed. I'm not completely happy with the changes in fluxCalibrate but it's a bit tricky because .uncertainty has its own .unit attribute which isn't updated if e.g. ext.unit is set directly.

@saimn saimn marked this pull request as ready for review March 17, 2021 22:15
@chris-simpson
Copy link
Contributor

I'm not clear why the code at the end of fluxCalibrate() has had to change the way it has. What's wrong with the existing code that multiplies ext rather than ext.nddata?

@saimn
Copy link
Contributor Author

saimn commented Mar 18, 2021

Yes, that's the part I'm not happy with, but the problem with the previous version is that as it does the computation without units it doesn't update .uncertainty.unit. So I choose to copy .convert_unit_to from Astropy because it may be useful in other cases, and then I need to update in-place ext.nddata.
Another option would be to use the former code and update .uncertainty.unit with .uncertainty._data_unit_to_uncertainty_unit(ext.unit), but that's a private method.
Or we could set directly data and uncertainty with what's done in .convert_unit_to e.g. ext.data = ext.unit.to(final_unit, ext.data, equivalencies=equivalencies) and same for uncertainty.
Last option would be to allow to set the nddata attribute, ext.nddata = new_nddata. That would be the best option, but replacing the nddata object which is stored in the parent AstroData object may be a problem if there are other references to it. But now I'm thinking that I could do the in-place replacement as done here in the nddata setter. That would be a good compromise and could be useful in other places.

@chris-simpson
Copy link
Contributor

But the previous version used ext.multiply, which calls nddata.multiply so is the failure to update units an omission from NDData? If the only operation that's missing from the previous code is the failure to update .uncertainty.unit then can't we keep the previous code and just update the unit manually as one additional line?

Although a quick look through nduncertainty.py looks like the uncertainty propagation is OK if .uncertainty.unit is None. So should we simply not worry about updating it?

@saimn
Copy link
Contributor Author

saimn commented Mar 18, 2021

In the previous version ext.multiply was called with a Numpy array, without units, so NDData was not updating the unit. ext.unit was updated manually, but ext.uncertainty.unit was not so it was still in electron2 / s2 which would brake if some other operations are done after on the spectrum. So yeah we could also update ext.uncertainty.unit manually, that would be the simplest, but I was thinking that it would be nicer if we could use directly NDData with a quantity.

@saimn
Copy link
Contributor Author

saimn commented Mar 18, 2021

So the other option I was thinking is ae59e96 which I just pushed. I'm happy to revert to the original version if you prefer it, and setting manually ext.uncertainty.unit. Not sure what is best, having the possibility to set a new NDData object to ext.nddata may be more generally useful ?

@chris-simpson
Copy link
Contributor

NDAstroData is hardcoded to use VarianceUncertainty (actually ADVarianceUncertainty), so if you want to set the .nddata directly then you either need to raise an error if not isinstance(nddata.uncertainty, VarUncertainty) or convert the array of the uncertainty subclass into variance (which I don't think is possible).

Because we use ADVarianceUncertainty, the units are always determined by the units of the data array, so can we just define the unit property in that class:

@property
def unit(self):
    return self.parent_nddata.unit ** 2

@saimn
Copy link
Contributor Author

saimn commented Mar 18, 2021

That's also an option, but then something must be done for the setter method as well.

@chris-simpson
Copy link
Contributor

Can't you simply raise an exception in the ADVarianceUncertainty.unit.setter? Or maybe have unit return _unit if that's set and what I said above if not? It looks like NDUncertainty is written to check compatibility of units so you can have data in W/m^2/nm and uncertainty in (W/m^2/Hz) ** 2 but why would you do that? And, even though it might be allowed by NDData, we don't really allow it in NDAstroData since we have a variance attribute that gets set without concern about units and is assumed to have units equal to the square of the data units. If we genuinely allow this sort of unit mixing (and support it, rather than just not worry about it), then the .variance property needs to take a look at.uncertainty.unit and do a unit conversion so that a calculation like data / sqrt(variance) really is returning the signal-to-noise ratio. At the moment it just returns uncertainty.array and perhaps we want our ADVarianceUncertainty subclass to keep things tightly-constrained.

Sorry this seems to have opened a can of worms but I think it's clearly the correct idea to use .unit rather than have it in the BUNIT keyword of the metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants