Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine images #271

Merged
merged 25 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
bdf4770
adjust data dq when dividing by flat = 0
maxwellmb Dec 10, 2024
7b707b2
test the DQ setting
maxwellmb Dec 10, 2024
93b5b2d
fixing typo
maxwellmb Dec 10, 2024
5440517
Revert "update README release notes"
semaphoreP Dec 11, 2024
7485f43
Revert "Revert "update README release notes""
semaphoreP Dec 11, 2024
93eddec
Merge pull request #268 from roman-corgi/dq_set_when_flat_zero
semaphoreP Dec 11, 2024
e4e79ee
v1.1.2
semaphoreP Dec 11, 2024
d506fde
Fixed typo in error message
bensutlieff Dec 11, 2024
ddcb952
Checked error propagation calculation
bensutlieff Dec 11, 2024
0efbdd7
Fixed misplaced brackets that were causing a TypeError
bensutlieff Dec 11, 2024
a35f80b
Fixed typos in test_combine.py docstring
bensutlieff Dec 11, 2024
0340f47
Added test for non-divisible case
bensutlieff Dec 17, 2024
370ad69
Added test for invalid collapse type
bensutlieff Dec 17, 2024
3d2b031
Added test for case where num_frames_scaling is False
bensutlieff Dec 17, 2024
862fbd5
Cleaned up test_mean_combine_subexposures_without_scaling
bensutlieff Dec 17, 2024
5554157
Added dq assert statements
bensutlieff Dec 17, 2024
51c231e
Fixed typo
bensutlieff Dec 17, 2024
9597d0f
Fixed typos in readme.md
bensutlieff Dec 17, 2024
7c9e82d
Added np.copy for dq_subset in combine.py
bensutlieff Dec 17, 2024
d8dffba
Added test for median combination with multiple bad pixels
bensutlieff Dec 17, 2024
7d9622a
Updated docstrings and fixed typos
bensutlieff Dec 17, 2024
56d0bd8
Clarified test comments
bensutlieff Dec 17, 2024
10a1031
Added test to confirm combining different values and nans works as ex…
bensutlieff Dec 17, 2024
66cf2bd
Ensured that data were copied before modification
bensutlieff Dec 17, 2024
2bee643
Fixed typo in combine.py
bensutlieff Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,11 @@ def example_step(dataset, calib_data, tuneable_arg=1, another_arg="test"):

Inside the function can be nearly anything you want, but the function signature and start/end of the function should follow a few rules.

* Each function should include a docstring that descibes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [goggle style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
* Each function should include a docstring that describes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [google style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
* The input dataset should always be the first input
* Additional arguments and keywords exist only if you need them--many relevant parameters might already by in Dataset headers. A pipeline step can only have a single argument (the input dataset) if needed.
* All additional function arguments/keywords should only consist of the following types: int, float, str, or a class defined in corgidrp.Data.
* (Long explaination for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
* (Long explanation for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
* The first line of the function generally should be creating a copy of the dataset (which will be the output dataset). This way, the output dataset is not the same instance as the input dataset. This will make it easier to ensure reproducibility.
* The function should always end with updating the header and (typically) the data of the output dataset. The history of running this pipeline step should be recorded in the header.

Expand Down Expand Up @@ -133,7 +133,7 @@ End-to-end testing refers to processing data as one would when we get the real d
- if you need to create mock L1 data, please do it in the script as well.
- See the existing tests in `tests/e2e_tests/` for how to structure this script. You should only need to write a single script.
4. Test that the script runs successfully on your local machine and produces the expected output. Debug as necessary. When appropriate, test your results against those obtained from the II&T/TVAC pipeline using the same input data.
5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac userse can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size").
5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac users can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size").
6. Document your recipe on the "Corgi-DRP Implementation Document" on Confluence (see the big table in Section 2.0). You should fill out an entire row with your recipe. Under addition notes, note if your recipe took significant run time (> 1 minute) and significant memory (> 1 GB).
7. PR!

Expand Down Expand Up @@ -209,6 +209,9 @@ Before creating a pull request, review the design Principles below. Use the Gith

## Change Log

**v1.1.2**
* Flat field correction marks pixels divided by 0 as bad

**v1.1.1**
* Fix unit test that wasn't cleaning up environment properly

Expand Down
2 changes: 1 addition & 1 deletion corgidrp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import pathlib
import configparser

__version__ = "1.1.1"
__version__ = "1.1.2"
version = __version__ # temporary backwards compatability

#### Create a configuration file for the corgidrp if it doesn't exist.
Expand Down
8 changes: 4 additions & 4 deletions corgidrp/combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def combine_images(data_subset, err_subset, dq_subset, collapse, num_frames_scal
n_samples = np.sum(n_samples, axis=0)
if collapse.lower() == "mean":
data_collapse = np.nanmean(data_subset, axis=0)
err_collapse = np.sqrt(np.nanmean(err_subset**2, axis=0)) /np.sqrt(n_samples) # not sure if this is correct, but good enough for now
err_collapse = np.sqrt(np.nanmean(err_subset**2, axis=0)) /np.sqrt(n_samples) # correct assuming standard error propagation
elif collapse.lower() == "median":
data_collapse = np.nanmedian(data_subset, axis=0)
err_collapse = np.sqrt(np.nanmean(err_subset**2, axis=0)) /np.sqrt(n_samples) * np.sqrt(np.pi/2) # inflate median error
Expand Down Expand Up @@ -72,7 +72,7 @@ def combine_subexposures(input_dataset, num_frames_per_group=None, collapse="mea
num_frames_per_group = len(input_dataset)

if len(input_dataset) % num_frames_per_group != 0:
raise ValueError("Input dataset of lenght {0} cannot be grouped in sets of {1}".format(len(input_dataset, num_frames_per_group)))
raise ValueError("Input dataset of length {0} cannot be grouped in sets of {1}".format(len(input_dataset), num_frames_per_group))

if collapse.lower() not in ["mean", "median"]:
raise ValueError("combine_subexposures can only collapse with mean or median")
Expand All @@ -82,7 +82,7 @@ def combine_subexposures(input_dataset, num_frames_per_group=None, collapse="mea
for i in range(num_groups):
data_subset = np.copy(input_dataset.all_data[num_frames_per_group*i:num_frames_per_group*(i+1)])
err_subset = np.copy(input_dataset.all_err[num_frames_per_group*i:num_frames_per_group*(i+1)])
dq_subset = input_dataset.all_dq[num_frames_per_group*i:num_frames_per_group*(i+1)]
dq_subset = np.copy(input_dataset.all_dq[num_frames_per_group*i:num_frames_per_group*(i+1)])

data_collapse, err_collapse, dq_collapse = combine_images(data_subset, err_subset, dq_subset, collapse=collapse,
num_frames_scaling=num_frames_scaling)
Expand All @@ -91,7 +91,7 @@ def combine_subexposures(input_dataset, num_frames_per_group=None, collapse="mea
pri_hdr = input_dataset[num_frames_per_group*i].pri_hdr.copy()
ext_hdr = input_dataset[num_frames_per_group*i].ext_hdr.copy()
err_hdr = input_dataset[num_frames_per_group*i].err_hdr.copy()
dq_hdr = input_dataset[num_frames_per_group*i].err_hdr.copy()
dq_hdr = input_dataset[num_frames_per_group*i].dq_hdr.copy()
hdulist = input_dataset[num_frames_per_group*i].hdu_list.copy()
new_image = data.Image(data_collapse, pri_hdr=pri_hdr, ext_hdr=ext_hdr, err=err_collapse, dq=dq_collapse, err_hdr=err_hdr,
dq_hdr=dq_hdr, input_hdulist=hdulist)
Expand Down
9 changes: 8 additions & 1 deletion corgidrp/l2a_to_l2b.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# A file that holds the functions that transmogrify l2a data to l2b data
import numpy as np
import copy
import corgidrp.data as data
from corgidrp.darks import build_synthesized_dark
from corgidrp.detector import detector_areas
Expand Down Expand Up @@ -128,6 +129,12 @@ def flat_division(input_dataset, flat_field):
#Divide by the master flat
flatdiv_cube = flatdiv_dataset.all_data / flat_field.data

#Find where the flat_field is 0 and set a DQ flag:
where_zero = np.where(flat_field.data == 0)
flatdiv_dq = copy.deepcopy(flatdiv_dataset.all_dq)
for i in range(len(flatdiv_dataset)):
flatdiv_dq[i][where_zero] = np.bitwise_or(flatdiv_dataset[i].dq[where_zero], 4)

# propagate the error of the master flat frame
if hasattr(flat_field, "err"):
flatdiv_dataset.rescale_error(1/flat_field.data, "FlatField")
Expand All @@ -138,7 +145,7 @@ def flat_division(input_dataset, flat_field):
history_msg = "Flat calibration done using Flat field {0}".format(flat_field.filename)

# update the output dataset with this new flat calibrated data and update the history
flatdiv_dataset.update_after_processing_step(history_msg,new_all_data=flatdiv_cube)
flatdiv_dataset.update_after_processing_step(history_msg,new_all_data=flatdiv_cube, new_all_dq = flatdiv_dq)

return flatdiv_dataset

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ def get_requires():

setup(
name='corgidrp',
version="1.1.1",
version="1.1.2",
description='(Roman Space Telescope) CORonaGraph Instrument Data Reduction Pipeline',
#long_description="",
#long_description_content_type="text/markdown",
Expand Down
182 changes: 181 additions & 1 deletion tests/test_combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,37 @@ def test_mean_combine_subexposures():
assert combined_dataset_2[0].ext_hdr['FILE2'] in ["2.fits", "1.fits", "3.fits", "4.fits"]
assert combined_dataset_2[0].ext_hdr['FILE3'] in ["2.fits", "1.fits", "3.fits", "4.fits"]

def test_mean_combine_subexposures_without_scaling():
"""
Test mean combine of subexposures for case where num_frames_scaling=False
"""

image1 = data.Image(img1, err=err1, dq=dq, pri_hdr = prhd, ext_hdr = exthd)
image1.filename = "1.fits"
image2 = image1.copy()
image2.filename = "2.fits"
image3 = image1.copy()
image3.filename = "3.fits"
image4 = image1.copy()
image4.filename = "4.fits"

dataset = data.Dataset([image1, image2, image3, image4])

combined_dataset = combine.combine_subexposures(dataset, 2, num_frames_scaling=False)

# Check that data and error values are not scaled up
assert(len(combined_dataset) == 2)
assert(np.all(combined_dataset[0].data == 1))
assert(np.all(combined_dataset[0].err == pytest.approx(1/np.sqrt(2))))
assert(np.all(combined_dataset[0].dq == 0))

# combine again
combined_dataset_2 = combine.combine_subexposures(combined_dataset, 2, num_frames_scaling=False)

assert(len(combined_dataset_2) == 1)
assert(np.all(combined_dataset_2[0].data == 1))
assert(np.all(combined_dataset_2[0].err == pytest.approx((1/np.sqrt(2))/np.sqrt(2))))
assert(np.all(combined_dataset_2[0].dq == 0))

def test_mean_combine_subexposures_with_bad():
"""
Expand Down Expand Up @@ -91,9 +122,11 @@ def test_mean_combine_subexposures_with_bad():
# the pixel with one bad pixel should have same value but higher error
assert combined_dataset[0].data[0][0] == 2
assert combined_dataset[0].err[0][0][0] == pytest.approx(2)
assert combined_dataset[0].dq[0][0] == 0 # 0 because one of the two frames had a good value
# compare against a pixel without any bad pixels
assert combined_dataset[1].data[0][0] == 2
assert combined_dataset[1].err[0][0][0] == pytest.approx(np.sqrt(2))
assert combined_dataset[1].dq[0][0] == 0

# the pixel with two bad pixels should be nan
assert np.isnan(combined_dataset[0].data[0][1])
Expand All @@ -102,7 +135,7 @@ def test_mean_combine_subexposures_with_bad():

def test_median_combine_subexposures():
"""
Test median combine of subexposures. And tests defualt case wihere num_frames_per_group isn't specified.
Test median combine of subexposures. And tests default case where num_frames_per_group isn't specified.
"""

image1 = data.Image(img1, err=err1, dq=dq, pri_hdr = prhd, ext_hdr = exthd)
Expand All @@ -123,6 +156,153 @@ def test_median_combine_subexposures():
assert(np.all(combined_dataset[0].err == pytest.approx(np.sqrt(2*np.pi))))
assert(np.all(combined_dataset[0].dq == 0))

def test_median_combine_subexposures_with_bad():
"""
Test median combine of subexposures with bad pixels over multiple combinations
"""
# use copies since we are going to modify their values
image1 = data.Image(np.copy(img1), err=np.copy(err1), dq=np.copy(dq),
pri_hdr = prhd, ext_hdr = exthd)
image1.filename = "1.fits"
image2 = image1.copy()
image2.filename = "2.fits"
image3 = image1.copy()
image3.filename = "3.fits"
image4 = image1.copy()
image4.filename = "4.fits"

# (0,0) has one bad frame
image1.dq[0][0] = 1
# (0,1) has both pixels bad
image1.dq[0][1] = 1
image2.dq[0][1] = 1

dataset = data.Dataset([image1, image2, image3, image4])

combined_dataset = combine.combine_subexposures(dataset, 2, collapse="median")

assert(len(combined_dataset) == 2)
# the pixel with one bad pixel should have same value but higher error. In both cases the error should be inflated by np.sqrt(np.pi/2) compared to mean error.
assert combined_dataset[0].data[0][0] == 2
assert combined_dataset[0].err[0][0][0] == pytest.approx(2 * np.sqrt(np.pi/2))
assert combined_dataset[0].dq[0][0] == 0 # 0 because one of the two frames had a good value
# compare against a pixel without any bad pixels
assert combined_dataset[1].data[0][0] == 2
assert combined_dataset[1].err[0][0][0] == pytest.approx(np.sqrt(2) * np.sqrt(np.pi/2))
assert combined_dataset[1].dq[0][0] == 0

# the pixel with two bad pixels should be nan
assert np.isnan(combined_dataset[0].data[0][1])
assert np.isnan(combined_dataset[0].err[0][0][1])
assert combined_dataset[0].dq[0][1] == 1

# combine again
combined_dataset_2 = combine.combine_subexposures(combined_dataset, 2, collapse="median")

assert(len(combined_dataset_2) == 1)
assert(np.all(combined_dataset_2[0].data == 4))

# error for pixel with no bad pixels in original data (i.e. most pixels in data)
assert combined_dataset_2[0].err[0][5][0] == pytest.approx(np.pi)

# error for pixel with one bad pixel in original data (i.e. no nans after first combination)
assert combined_dataset_2[0].err[0][0][0] == pytest.approx(0.5 * np.pi * np.sqrt(6))

# error for pixel with two bad pixels in original data (i.e. 1 nan after first combination)
assert combined_dataset_2[0].err[0][0][1] == pytest.approx(np.pi * np.sqrt(2))

assert(np.all(combined_dataset_2[0].dq == 0))

def test_combine_different_values():
"""
Test whether the function correctly combines different values.
"""

# use copies since we are going to modify their values
image1 = data.Image(np.copy(img1), err=np.copy(err1), dq=np.copy(dq),
pri_hdr = prhd, ext_hdr = exthd)
image1.filename = "1.fits"
image2 = image1.copy()
image2.filename = "2.fits"
image3 = image1.copy()
image3.filename = "3.fits"
image4 = image1.copy()
image4.filename = "4.fits"

# (0,0) has different values in each frame. Some are bad pixels.
image1.data[0][0] = 5
image2.data[0][0] = 6
image3.data[0][0] = 9
image4.data[0][0] = 19

image2.dq[0][0] = 1
image4.dq[0][0] = 1

# (0,1) is a bad pixel in every frame
image1.dq[0][1] = 1
image2.dq[0][1] = 1
image3.dq[0][1] = 1
image4.dq[0][1] = 1

dataset = data.Dataset([image1, image2, image3, image4])

combined_dataset = combine.combine_subexposures(dataset, collapse="median")

assert(len(combined_dataset) == 1)

# Most pixels had good values of 1 in all frames
assert combined_dataset[0].data[0][2] == 4
assert combined_dataset[0].err[0][0][2] == pytest.approx(2*np.sqrt(np.pi/2))

# (0,0) has a different median value calculated ignoring nans
assert combined_dataset[0].data[0][0] == 7 * 4 # median value scaled by number of images
assert combined_dataset[0].err[0][0][0] == pytest.approx(2 * np.sqrt(np.pi))

# (0,1) is a nan
assert np.isnan(combined_dataset[0].data[0][1])
assert np.isnan(combined_dataset[0].err[0][0][1])

# the updated bad pixel map only contains one bad pixel (i.e. the pixel for which there were no good values)
assert combined_dataset[0].dq[0][0] == 0
assert combined_dataset[0].dq[0][1] == 1

def test_not_divisible():
"""
Tests that function correctly fails when the length of the dataset is not divisible by num_frames_per_group.
"""

image1 = data.Image(img1, err=err1, dq=dq, pri_hdr = prhd, ext_hdr = exthd)
image1.filename = "1.fits"
image2 = image1.copy()
image2.filename = "2.fits"
image3 = image1.copy()
image3.filename = "3.fits"
image4 = image1.copy()
image4.filename = "4.fits"

dataset = data.Dataset([image1, image2, image3, image4])

with pytest.raises(ValueError):
combined_dataset = combine.combine_subexposures(dataset, 3) # Should fail as 4 % 3 != 0

def test_invalid_collapse():
"""
Tests that function correctly fails when collapse type is not valid.
"""

image1 = data.Image(img1, err=err1, dq=dq, pri_hdr = prhd, ext_hdr = exthd)
image1.filename = "1.fits"
image2 = image1.copy()
image2.filename = "2.fits"
image3 = image1.copy()
image3.filename = "3.fits"
image4 = image1.copy()
image4.filename = "4.fits"

dataset = data.Dataset([image1, image2, image3, image4])

with pytest.raises(ValueError):
combined_dataset = combine.combine_subexposures(dataset, collapse="invalid_option")

if __name__ == "__main__":
test_mean_combine_subexposures()
18 changes: 17 additions & 1 deletion tests/test_flat_div.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,25 @@ def test_flat_div():
print("Error estimated:",err_estimated)
assert(err_flatdiv == pytest.approx(err_estimated, abs = 1e-2))

print(flatdivided_dataset[0].ext_hdr)
# print(flatdivided_dataset[0].ext_hdr)
corgidrp.track_individual_errors = old_err_tracking


### Check to make sure DQ gets set when flatfield zero. ###

## Injected some 0s.
pixel_list = [[110,120],[50,50], [100,100]]
for pixel in pixel_list:
flatfield.data[pixel[0],pixel[1]] = 0.

## Perform flat division
flatdivided_dataset_w_zeros = l2a_to_l2b.flat_division(simflat_dataset, flatfield)

## Check that all the pixels that were zeroed out have the DQ flag set to 4.
for pixel in pixel_list:
for i in range(len(simdata_filenames)):
assert np.bitwise_and(flatdivided_dataset_w_zeros.all_dq[i,pixel[0],pixel[1]],4)


if __name__ == "__main__":
test_flat_div()
2 changes: 1 addition & 1 deletion tests/test_mean_combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@


def test_mean_im():
"""Verify method calculates mean image and erorr term."""
"""Verify method calculates mean image and error term."""
tol = 1e-13

check_combined_im = np.mean(check_ims, axis=0)
Expand Down
Loading