Fix NaN handling in Record.adc, and other fixes #481

bemoody · 2024-04-18T20:05:35Z

Fix several bugs in Record.adc:

Previously, the function would try to convert all samples to integers and then, for any samples that were NaN, replace the corresponding elements with the appropriate sentinel value. Even though this was probably safe in most cases, casting NaN to an integer is implementation-defined behavior, and raises a warning by default (issue Record.adc: RuntimeWarning: invalid value encountered in cast #480).
NaN just plain wasn't handled for the inplace=True, expanded=False case. (Currently, we don't use inplace=True anywhere internally; although it saves a bit of memory, it's destructive and so it's probably wise for high-level functions like wrsamp to avoid it.)
The expanded=True case relied on self.n_sig (in contrast to expanded=False, which operates based on the dimensions of p_signal.) This meant it would fail if the caller didn't explicitly set n_sig, which was an annoying inconsistency.

Also, tidy up duplicated code and make things a little more efficient.

A side note: I don't think the inplace=True mode is particularly great to have. It conflates two things (modifying the Record object attributes, which many applications want; and modifying the array contents, which you may think you want until you realize it subtly breaks something.) It does save some memory, but not as much as you'd hope. (That copy=False is pretty much a lie.) And of course I don't like functions whose return type is dependent on their arguments. So I would definitely put inplace on the chopping block for 5.0.0. Still, I think the updated code here isn't too terribly ugly.

This set of changes is the first step to making wfdb.wrsamp work for multi-frequency (issue #336). Next is to fix Record.calc_adc_params, then Record.set_d_features.

When converting physical to digital sample arrays, we must replace NaN values (which represent a missing sample) with the appropriate invalid-sample sentinel value. This is done correctly for normal uses of the package, but if the application directly invoked adc(inplace=True), NaNs would not have been handled (and were instead set to an implementation-defined value.) (Note that we don't use inplace=True internally because this overwrites the original floating-point array. Applications may want to use inplace=True to save memory, but this requires knowing that the original array is no longer needed.)

When converting physical to digital sample arrays, all the information we need is contained in self.e_p_signal, self.adc_gain, self.baseline, and self.fmt. We don't need to rely on self.n_sig here, and we don't use n_sig in the expanded=False case, so for consistency, don't use n_sig in the expanded=True case either.

When converting physical to digital sample arrays, we must replace NaN values (which represent a missing sample) with the appropriate invalid-sample sentinel value. Attempting to convert a floating-point NaN to an integer, as was done here, is implementation-defined behavior (and is controlled, to an extent, by the global numpy configuration.) We don't want to be dependent on the hardware or the global numpy configuration, and for efficiency it's best to avoid triggering floating-point errors to begin with. So instead of converting the floating-point array to integers, and fixing up the integer array after the fact, we want to replace the floating-point values *first*, and then convert to integers.

- Test that Record.adc works when n_sig is not set. (Previously, this didn't work with expanded=True.) - Test that Record.adc handles NaN by mapping it to the correct invalid-sample value. (Previously, this didn't work with expanded=False and inplace=True.) Use multiple formats to test that this takes the format into account. Furthermore, the previous code relied on implementation-defined behavior to handle NaN, which normally results in a RuntimeWarning. Within the test suite, we set the numpy error handling mode to "raise", so such implementation-defined conversions actually result in a FloatingPointError.

When converting physical to digital sample arrays, we must replace NaN values with the appropriate invalid-sample sentinel value. To do this, we need to call np.isnan and use the result as a mask to replace entries in the output array. (Although the function np.nan_to_num also exists, it's less efficient: it literally does just this, but also handles infinities.) What we don't need to do is to call any() to check whether there are any true entries - that just means we're iterating through the same array three times rather than once. Furthermore, np.copyto can broadcast d_nans across the rows of p_signal, so all the channels can be handled at once. Also use copyto in adc_inplace_1d for consistency.

tompollard · 2024-04-19T18:38:16Z

@bemoody this looks good to me. were the tests intentionally stopped?

tompollard · 2024-04-19T18:53:11Z

Hmm, it looks like maybe it is failing on format checks (https://github.com/MIT-LCP/wfdb-python/actions/runs/8744000168/job/24038235919) and then the remaining tests are being cancelled.

bemoody · 2024-04-19T19:31:16Z

Yeah, I think it should work, it's failing because the latest version of black requires formatting changes (not related to this pull AFAIK.)

tompollard · 2024-04-19T19:34:08Z

Yeah, I think it should work, it's failing because the latest version of black requires formatting changes (not related to this pull AFAIK.)

I assume these issues were fixed in #482? If so, please could you rebase this PR on main?

Benjamin Moody added 12 commits April 18, 2024 15:32

adc (inplace=True, expanded=False): refactor for clarity.

dc05f44

adc (inplace=False, expanded=False): refactor for clarity.

2eabaa5

adc (expanded=False): move shared logic to a function.

d629897

adc (inplace=True, expanded=True): refactor for clarity.

7ae21d4

adc (inplace=False, expanded=True): refactor for clarity.

8945861

adc (expanded=True): move shared logic to a function.

235e142

adc: rename variables for clarity.

a901163

adc: combine shared logic into one inner function.

5b56407

bemoody force-pushed the bm/adc branch from 44f6dc0 to 5b56407 Compare April 19, 2024 19:34

tompollard approved these changes Apr 19, 2024

View reviewed changes

tompollard merged commit 34b989e into main Apr 19, 2024
14 checks passed

tompollard deleted the bm/adc branch April 19, 2024 19:43

bemoody mentioned this pull request Apr 25, 2024

Record.adc: RuntimeWarning: invalid value encountered in cast #480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NaN handling in Record.adc, and other fixes #481

Fix NaN handling in Record.adc, and other fixes #481

bemoody commented Apr 18, 2024

tompollard commented Apr 19, 2024

tompollard commented Apr 19, 2024

bemoody commented Apr 19, 2024

tompollard commented Apr 19, 2024

Fix NaN handling in Record.adc, and other fixes #481

Fix NaN handling in Record.adc, and other fixes #481

Conversation

bemoody commented Apr 18, 2024

tompollard commented Apr 19, 2024

tompollard commented Apr 19, 2024

bemoody commented Apr 19, 2024

tompollard commented Apr 19, 2024