Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function that divides by the exposure time #75

Open
wants to merge 434 commits into
base: main
Choose a base branch
from

Conversation

LisaAltinier
Copy link

The function divides data and error by the exposure time. Header updated. Test to check that the function works.

Copy link
Contributor

@mygouf mygouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few changes needed and some comments.

data.frames[i].err = data.frames[i].err / exposure_time

all_data_new[i] = data.frames[i].data
all_err_new[i] = data.frames[i].err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works like this. Just for efficiency, is it necessary to create the variables all_data_new and all_err_new? Can't we use directly the updated "data" instance?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, your suggestion is to update the data.frames.data and extract the data.all_data array. Is this what you mean? The data.frames.data (and .err) are independent from data.all_data (and .all_err), so I think I have to update them separately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way the code is currently written is to enable the use of the update_after_processing_step function, which is our preferred way to update the object. Is that the case Lisa?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is, would it be possible to update the data set this way: data.update_after_processing_step(history_msg, new_all_data=data.all_data, new_all_err = data.all_err),
without the need for creating the "all_data_new" and "all_err_new" arrays and making the code slightly more efficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either solution is adequate here, given that this won't be a resource or time liming step. We're probably talking about ms-level time savings. I actually like Lisa's implementation since it explicitly sets a variable (all_data_new) and then uses it to set the new data in update_after_processing_step. There's a slight danger/opaqueness of setting data.frames[i].data and then using data.all_data to pass into update_after_processing_step. There's an implicit assumption those two variables are views of the same data, which can be open to bugs when they aren't actually pointing to the same piece of memory as well as making the code slightly less opaque.

Either way, I think the implementation is probably good enough for us!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I think just setting all_data in update_after_processing_step won't work in this case,
since the division of the exp time is not updated in all_data yet. Or am I wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuergenSchreiber, you are referring to Marie's suggestion? I think you might be right, although there should be some quick fixes that would make it work again. Either way, I think we shouldn't spend too much time on analyzing this, unless folks are really interested in coming up with the most succinct solution.

all_data_new[i] = data.frames[i].data
all_err_new[i] = data.frames[i].err

data.frames[i].ext_hdr.set('EXPTIME', 1 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand why the EXPTIME is set to 1 here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I am normalizing by the exposure time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you're normalize by this number, I don't think we want to change this header keyword. It's still valuable to know what the exposure time was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unless this is a required behavior in the FDD or something)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'll correct this part

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't want to update the EXPTIME keyword.

all_err_back = np.zeros(not_normalized_dataset.all_err.shape)
for i in range(len(not_normalized_dataset.frames)):

# test that the exposure time in the header is 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPTIME is not 1 in the default header so this number needs to be updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand. In the not_normalized_dataset the EXPTIME is not 1, but in the norm_dataset I set it =1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in a comment above, we don't want to update the EXPTIME keyword.

all_data_back[i,:,:] = exposure_times[i] * norm_dataset.all_data[i,:,:]
all_err_back[i,:,:] = exposure_times[i] * norm_dataset.all_err[i,:,:]

#check that if you multiply the output by the exposure time you can recover the input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a duplicate test?

@maxwellmb maxwellmb mentioned this pull request May 24, 2024

sim_data = np.asarray(np.random.poisson(lam=150.0, size=(1024, 1024)), dtype=float)
sim_err = np.asarray(np.random.poisson(lam=1.0, size=(1024, 1024)), dtype=float)
sim_dq = np.asarray(np.ones((1024, 1024)), dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this to ones makes all the pixels bad!

I think you want np.zeros

exposure_time = float(data.frames[i].ext_hdr['EXPTIME'])

data.frames[i].data = data.frames[i].data / exposure_time
data.frames[i].err = data.frames[i].err / exposure_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramya-anche is implementing a new function for PR #88 rescale_errors that we should use here. We should maybe hold-off on finalizing this PR until she's done with that function.

Copy link
Contributor

@JuergenSchreiber JuergenSchreiber Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxwellmb, I think that is not necessary in this case since exposure_time has no uncertainty, so a simple division of err by exposure_time should be ok. Dividing the flat field is something different there you need a proper error propagation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still want to use the rescale_errors function here, it's just that it won't have a pixel-dependent scaling as we do in the flats. I do agree that the flat is different though, since there are kind of two steps there with the additive component.

assert norm_dataset.frames[i].ext_hdr['UNITS'] == "electrons/s"

#check that the quality flag has exists and it's 1 everywhere
assert np.mean(not_normalized_dataset.frames[i].dq) == pytest.approx(1, abs=1e-6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, it should be 0 if everything is good. Please see here.

@JuergenSchreiber
Copy link
Contributor

@LisaAltinier, please have a look at e.g. l2a_to_l2b.convert_to_electrons(). I have added a dict parameter to the update_after_processing_step() where you easily can update the header entries as you need for the units.

Sergi Hildebrandt and others added 26 commits June 18, 2024 13:50
TO avoid confusion with existing test_kgain.py in main testing KGain class and conversion to e/s
Using native KGain class instead of single float value
…ion to construct master dark from noise maps
Avoiding some 1-dimensional slices created when using stacks of Dataset objects.

UT have a variety of cases and I needed to add a case for stack2 to be just an Image object.
These two ut tests are in fact checked at every single run in calibrate_nonlin.py:
These are the lines in calibrate_nonlin.py that always check for the conditions of the two ut removed.

The issue comes from the fact that we now import parameters from detector.py and not an external yaml file. It would be odd to change the input arguments of calibrate_nonlin.py to accept a flag or new parameter values for only two ut that are anyways checked all the time in the main function.

check.real_nonnegative_scalar(rms_low_limit, 'rms_low_limit', TypeError)
    check.real_nonnegative_scalar(rms_upp_limit, 'rms_upp_limit', TypeError)
    if rms_low_limit >= rms_upp_limit:
        raise CalNonlinException('rms_upp_limit must be greater than rms_low_limit')
manduhmia and others added 30 commits August 6, 2024 12:10
end-to-end processing of L1 to L2b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants