Eliminate the `Variable` class #262

jagerber48 · 2024-08-12T03:37:04Z

Closes # (insert issue number)
Executed pre-commit run --all-files with no errors
The change is fully covered by automated unit tests
Documented in docs/ as appropriate
Added an entry to the CHANGES file

This PR:

Eliminates the Variable class and eliminates the AffineScalarFunc.derivatives property.
The AffineScalarFunc class is renamed to UFloat with AffineScalarFunc left as a legacy name.
The LinearCombo class is refactored into the UCombo class but the architectural role is similar: track the uncertainties of UFloat objects as a lazily expanded combination of objects with uncertainty
The UAtom object is introduced as the fundamental element which captures uncertainty. Each UAtom models a random variable with zero mean and unity standard deviation/variance. Each UAtom has a uuid and all UAtom are mutually statistically independent. The expanded version of the UCombo maps dict[UAtom, float].

For more detail/discussion see #251 (comment)

Note that as of the time opening this PR the PR is still a work in progress. Not all elements in the bullet list above are implemented and a good handful of tests are still failing.

jagerber48 · 2024-11-16T21:18:04Z

tests/test_uncertainties.py

@@ -104,9 +105,6 @@ def test_ufloat_fromstr():
        "3.4(nan)e10": (3.4e10, float("nan")),
        # NaN value:
        "nan+/-3.14e2": (float("nan"), 314),
-        # "Double-floats"


These floats are too large to work with. Prior to this PR the large float value could be saved into the AffineScalarFunc object and displayed, but any arithmetic with such large values would result in an OverflowError. In the new framework even displaying a UFloat the first time requires a calculation, even if it's as simple as sqrt(x**2). So in the new framework the OverflowError appears immediately.

jagerber48 · 2024-12-07T13:03:40Z

All the tests are passing except some having to do with the uarray inverse and pseudo inverse functions. The code for this is pretty complicated. It might make sense to just rewrite the code to convert array-compatible functions to ufloat array compatible functions.

…_refactor

codecov · 2024-12-12T23:18:11Z

Codecov Report

Attention: Patch coverage is 88.03922% with 61 lines in your changes missing coverage. Please review.

Project coverage is 93.62%. Comparing base (61f688f) to head (9607a0d).

Files with missing lines	Patch %	Lines
uncertainties/unumpy/core.py	78.41%	30 Missing ⚠️
uncertainties/ucombo.py	73.95%	25 Missing ⚠️
uncertainties/core.py	92.15%	4 Missing ⚠️
tests/helpers.py	97.22%	1 Missing ⚠️
uncertainties/ops.py	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #262      +/-   ##
==========================================
- Coverage   96.55%   93.62%   -2.94%     
==========================================
  Files          17       18       +1     
  Lines        1947     2039      +92     
==========================================
+ Hits         1880     1909      +29     
- Misses         67      130      +63

Flag	Coverage Δ
macos-latest-3.10	`92.25% <88.03%> (-2.77%)`	⬇️
macos-latest-3.11	`92.25% <88.03%> (-2.77%)`	⬇️
macos-latest-3.12	`92.25% <88.03%> (-2.77%)`	⬇️
macos-latest-3.8	`92.23% <88.01%> (-2.78%)`	⬇️
macos-latest-3.9	`92.23% <88.01%> (-2.78%)`	⬇️
no-numpy	`72.24% <63.33%> (-2.90%)`	⬇️
ubuntu-latest-3.10	`92.25% <88.03%> (-2.77%)`	⬇️
ubuntu-latest-3.11	`92.25% <88.03%> (-2.77%)`	⬇️
ubuntu-latest-3.12	`92.25% <88.03%> (-2.77%)`	⬇️
ubuntu-latest-3.8	`92.23% <88.01%> (-2.78%)`	⬇️
ubuntu-latest-3.9	`92.23% <88.01%> (-2.78%)`	⬇️
windows-latest-3.10	`92.25% <88.03%> (-2.77%)`	⬇️
windows-latest-3.11	`92.25% <88.03%> (-2.77%)`	⬇️
windows-latest-3.12	`92.25% <88.03%> (-2.77%)`	⬇️
windows-latest-3.8	`92.23% <88.01%> (-2.78%)`	⬇️
windows-latest-3.9	`92.23% <88.01%> (-2.78%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jagerber48 · 2024-12-13T12:41:52Z

Wow all tests are passing now. That's a big milestone for this PR. If folks are interested I'd be curious to hear general feedback on the source code changes in this PR. I tried to do as little wholesale rewriting as possible but did chose to do some of that in unumpy/core.py.

Here's a summary of the changes so far. See #251 for a more thorough discussion of these changes and their motivations.

Remove the Variable class.
Remove the LinearCombination class.
Add the UAtom class. A UAtom models a standard normally distributed random variable (zero mean, unity variance).
Add the UCombo class. This class essentially maps UAtoms to float. This class models a linear combination of UAtoms. Exposes a method to extract its own standard deviation and its covariance with other UCombos. This class supports multiplication by floats/ints and additions with other UCombo's. This simple arithmetic support makes uncertainty propagation calculations nice.
Swap the names UFloat and AffineScalarFunc. Now UFloat defines the actual class and AffineScalarFunc is an alias (perhaps it could be deprecated later).
The UFloat class now stores uncertainty as a UCombo rather than a LinearCombination. Previously uncertainty was roughly recorded as the derivatives of an AffineScalarFunc with respect to Variables. This information was stored and accessed in the derivatives attribute. Now there is no longer a derivatives attribute. There is only error_components which now returns a mapping from UAtom to float indicating the weighting of each UAtom within a UFloat. This is the biggest user-facing conceptual change in this PR.
It was relatively straightforward to give UFloat a simple hash. This will resolve Pandas sort_values with multiple columns does not work for AffineScalarFunc #186 and Should `Variable.__eq__` be based on `id()` or a separate identifier? #218 and Implement hash for AffineScalarFunc #219. I'll point out that these conversations were major concrete motivators for this refactor (together with a less concrete desire to simplify the codebase)
Previously AffineScalarFunc and Variable could be tagged with a user-readable tag. Now UFloats cannot be tagged, but UAtoms can.
Because of the previous architecture there were some confusing behaviors with respect to copying/pickling. In the new framework I think copying and pickling are much more straightforward. The behaviors are different than before, but I argue they are better. You can look at how the tests changed for more details onthis. See also Strange behavior on copying #260.
The code in unumpy/core.py responsible for extending np.linalg.inv and np.linalg.pinv to operate sensibly on umatrix or arrays of UFloat were rewritten. The old code (1) relied centrally on AffineScalarFunc.derivatives and (2) was quite confusing to understand. In this case I found it easier to rewrite the code than to understand and modify the old code. The old code could only uncertainty-ify functions that took one array as an argument. The new code detect UFloats or arrays containing UFloat's in any argument. By default it uses numerical differentiation, but the function can be passed functions to calculate analytical derivatives.

Some todo items

Check that performance isn't impacted
Multiple passes of clean up on the code
Some passes on the source code for the tests. Many tests were changed, some dramatically. Some tests were added. I think the tests as they are now are probably ok. I think some checks that this PR can go through would be good, but a more thorough addressing of tests should be done in support of resolving Clean up tests #273 in later PRs.
Correct and clean up comments within the source code. The legacy uncertainties code has a lot of comments and documentation. This PR is a part of a thrust to help simplify the source code so that it is a bit more self-documenting. With this, in the places touched by this PR, I want to slim down any comments, or at the very least make sure they are accurate. Especially with the removal of the derivatives attribute, I think there may be comments in lots of places that are no longer accurate. There are probably places I could add some comments.
Update the documentation. I haven't touched the docs documentation at all. I think there will need to be a lot of updating there. Off the top of my head I know the docs make reference to the "derivatives of variables" philosophy that is being replaced with the "linear combination of random variables" philosophy. I also know there will be some docs about copying/pickling that need to be updated, and of course docs on derivatives and error_components will need to be updated.
Bikeshed the naming of the UFloat, UCombo, and UAtom classes.
Write the changelog entries

Other notes:

COMMENT FOR REVIEWERS: Obviously I would appreciate wholistic reviews, But I'm especially curious for reviews on changes to the tests that start to illustrate how the public API is changing.
See this change. The to_uarray_func conversion function assumes inputs are numpy arrays and doesn't "respect" numpy matrices. I made this choice because of the comment on numpy.matrix documentation discouraging their use. See 4.x Release plans #244 (comment). It seems like the maintainers are in some agreement that we can drop matrix support, so maybe consider this a step in that direction.
I'll update this comment or later ones as I think of more changes or todos. I'll likely use this comment as a start for the changelog.

This test wasn't running because the generator defining ufloat_args was consumed while calculating output so it couldn't be iterated over again for the for loop. Need to save the result as a list before using it.

…put a little

These functions need to be overhauled so for now trying to make minimal changes while keeping functionality to keep the PR as slim as possibly (though it's already pretty huge)

…ng commit message)

andrewgsavage · 2024-12-26T19:11:39Z

uncertainties/ucombo.py

+        return self._hash
+
+    def __lt__(self: UAtom, other: UAtom) -> bool:
+        return self.uuid < other.uuid


codecov gives a load of warnings for UAtom and UCombo. Would bit be worth having tests to show methods like eq, lt behave as expected?

it's also a bit suprising some of these lines (eg for UCombo's eq) don't get hit by any of the existing tests

Thanks for taking a look. I haven't looked at codecov yet. The story for why UCombos __eq__ doesn't get hit is worth sharing.

First off, before this PR and as of this comment, UFloat.__eq__ does not check for equality of the nominal value and equality of the uncertainty as I would have naively implemented. Rather, it looks at the difference between self and other and checks that the resulting UFloat has zero nominal value and zero std_dev. Because subtraction is supported between UFloat and float, this allows __eq__ to easily support comparison between UFloat and float. In going through the tests I've learned that, for better or for worse, uncertainties has definitely chosen the convention that a UFloat with a std_dev of zero should behave like a regular float.

I am still using this difference calculation for UFloat.__eq__. It would be more natural to me to have __eq__ be something like

def __eq__(self, other): if isinstance(other, UFloat): return (self.nominal_value == other.nominal_value and self.uncertainty = other.uncertainty) else: if self.std_dev != 0: return False else: return self.nominal_value == other

But currently the code implementing __eq__ is defined in ops.py and gets monkey patched onto UFloat. I can't import UFloat into this function because it would create circular imports between ops.py and core.py.

Anyways, I can look into disentangling this.

Note though, that I tried something like the above code for a second and found out that the current version of UCombo.__eq__ was broken because I didn't know sorted returned a list! Easy fix, but emphasizes the importance of testing.

Note that I made the decision that UCombo __eq__ and __hash__ use the expanded dictionary for comparison. Thuis means that two UCombos with different ucombo_tuples could compare equal. For UFloat I think it is clear that we want two UFloats to compare based on the expanded dictionary from the uncertainty. For UCombo it wasn't as obvious to me. For example, if dx is a UAtom then it would be possible for there to be a nested UCombo storing something like

dx + (-dx + (dx + (-dx + (dx + (-dx + ...)))))

Which ultimately is equal to 0. But the former might take a longer time to compute.

Actually I think I need to add a special case for 0 so that the presence of a UAtom in one UCombo expanded dict but not the other doesn't matter if the weight is equal to 0.

In any case, does this make sense to allow two UCombo to compare the same even if they have different ucombo_tuple? Or should that distinction be made apparent in __eq__ and __hash__ (but UFloat will still compare based on expanded dict)?

sorted returns a (unhashable) list, so much convert this list to a tuple before hashing.

- [x] Closes #274 - [x] Executed `pre-commit run --all-files` with no errors - [x] The change is fully covered by automated unit tests - [x] Documented in docs/ as appropriate - [x] Added an entry to the CHANGES file add a performance benchmark test. This test is important especially to ensure #262 doesn't introduce a performance regression. --------- Co-authored-by: andrewgsavage <[email protected]>

codspeed-hq · 2024-12-28T02:21:35Z

CodSpeed Performance Report

Merging #262 will degrade performances by 35.86%

_{Comparing jagerber48:feature/linear_combo_refactor (9607a0d) with master (61f688f)}

Summary

❌ 5 regressions

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`master`	`jagerber48:feature/linear_combo_refactor`	Change
❌	`test_repeated_summation_speed[100000]`	2.9 s	4.3 s	-33.86%
❌	`test_repeated_summation_speed[10000]`	274.9 ms	424.7 ms	-35.26%
❌	`test_repeated_summation_speed[1000]`	27 ms	42.1 ms	-35.86%
❌	`test_repeated_summation_speed[100]`	2.8 ms	4.3 ms	-34.39%
❌	`test_repeated_summation_speed[10]`	401.2 µs	562.5 µs	-28.68%

jagerber48 · 2024-12-28T18:04:24Z

Ok, it is time to pick the names of the classes because I want to start updating the docs (I'm not technically blocked but would be nice to get it right earlier). Right now we have UFloat, UComboandUAtom`. These classes are modeling the following.

For reference, suppose we have

x = x_0 + dx
y = y_0 + dy
z = x*y = x_0 * y_0 + (y_0 * dx + x_0 * dy)

dx and dy are UAtoms (individual random elements with zero mean and unity standard deviation/variance), 1*dx, 1*dy, and y_0*dx + x_0*dy are UCombos (float-weighted linear combinations of UAtoms) and x, y, and z are UFloats (nominal value + uncertainty given as a UCombo).

Notes on names:

UFloat: I think we're all in agreement that this name is fine. AffineScalarFunc will remain a legacy alias that I suggest we add a FutureWarning too in an upcoming release (we shouldn't do that in this PR however since I don't want to get bogged down in that discussion yet. For the record, I will say there has been discussion about the name UNumber for something that would support more types of nominal values and uncertainties than just floats. E.g. Decimal or possibly complex numbers or something. But given UFloat is already in use I say it makes sense to keep it.
UCombo: This class roughly replaces the old LinearCombination class. I don't have strong opinions on this name. LinearCombination would be fine, but I would like to somehow emphasize that is a linear combination of zero-mean random variables which is not captured by just LinearCombination. Other obvious options: ULinCombo, ULinearCombo, ULinearCombination. The same options without a U. Again, my opinions aren't strong here. If I don't hear otherwise I will keep UCombo.
UAtom. I think this is the most controversial. UAtom kind of serves to replace the old Variable class, but it is much simpler than the Variable class and this reduction in complexity is what drives the overall improvement in the architectures (IMO). I chose the name UAtom because it is an atomic/irreducible/indivisible unit (this is what is meant by "atom") of uncertainty. A UAtom has zero mean, unity variance, and is not correlated with any other UAtom. This is in contrast with the UCombo which is a combination of many UAtom. However, it's been pointed out that users outside of the physical sciences may be less familiar with the term "Atom" and what it might mean here. Open to proposals for alternative names. I don't have good ideas. UnitUncertainty, StdNormUnit (standing for "standard normal unit" referring to the standard normal distribution), RandomUnit, URandom. I prefer UAtom to all of these but open to other proposals and discussion.

…proves performance.

…er checks for absence of uncertainty terms.

jagerber48 added 5 commits August 9, 2024 08:00

A first pass on linear_combo -> ucombo

3a53d8c

revert test changes

01ef70d

rework UCombo, delete Variable

f338549

some test updates

5f64b9e

Pass more tests

80e3d31

jagerber48 changed the title ~~Feature/linear combo refactor~~ Eliminate the Variable class Aug 12, 2024

jagerber48 added 2 commits November 4, 2024 21:01

Merge branch 'master' into feature/linear_combo_refactor

f939766

remove large float test

d7b9a2e

jagerber48 commented Nov 16, 2024

View reviewed changes

jagerber48 added 12 commits November 16, 2024 22:59

Clean up UFloat class

48c3973

some formatting changes

1731b5a

uncertainty instead of _linear_combo

f0d4225

variety of updates and test cleanup

1b385f2

copy tests

fe849ff

pickling tests

c8fc497

no negative std_dev in tests

681bef9

modify test

2c1b11d

more tests

ac6b497

handle some edge cases with 0 uncertainty

b3f7fb2

fix test_hypot

fc6e592

compare analytic and numerical derivatives on math functions

f2bfc60

jagerber48 mentioned this pull request Dec 7, 2024

Clean up tests #273

Open

unumpy

2361849

jagerber48 added 2 commits December 13, 2024 07:12

mainly code to handle inv and pinv array function

3c428a2

Merge remote-tracking branch 'lmfit/master' into feature/linear_combo…

24f1249

…_refactor

jagerber48 mentioned this pull request Dec 12, 2024

Refactor and use Cholesky decomposition for correlated_values and correlated_values_norm #271

Open

5 tasks

fix import

4084e7f

jagerber48 added 7 commits December 17, 2024 21:53

fix broken test that wasn't running

7ad82ba

This test wasn't running because the generator defining ufloat_args was consumed while calculating output so it couldn't be iterated over again for the for loop. Need to save the result as a list before using it.

ucombo clean up and refactor

fb69b1f

don't calculate std_dev with kwargs inputs, cleanup f_with_affine_out…

7c35233

…put a little

some correlated_values_norm cleanup

0eb135b

These functions need to be overhauled so for now trying to make minimal changes while keeping functionality to keep the PR as slim as possibly (though it's already pretty huge)

todo on untested, probably broken functions

c6f116f

codecov CI

a8fd413

revert previous commit (it was made to the wrong branch and had a wro…

0abc0f6

…ng commit message)

andrewgsavage reviewed Dec 26, 2024

View reviewed changes

fixed ucombo hash

0086a0a

sorted returns a (unhashable) list, so much convert this list to a tuple before hashing.

Merge branch 'master' into feature/linear_combo_refactor

e6211ca

jagerber48 added 3 commits December 29, 2024 04:15

start updating docs

7aeeed0

instance check for ABC real seems to be slow. See if removing that im…

ddf2e1d

…proves performance.

need to include int

40b088c

jagerber48 mentioned this pull request Dec 29, 2024

Eliminate less than and greater than comparisons for UFloat #278

Open

jagerber48 added 12 commits December 30, 2024 13:59

documentation and doctest

968feef

fix index

d5edb5f

cast default dict to dict for ucombo expanded dict. Also don't do eag…

76a4b80

…er checks for absence of uncertainty terms.

Add back make.bat

1da88c3

add doctest sphinx extension in conf.py so doctests can run

54179b7

Merge branch 'feature/add_make.bat' into feature/linear_combo_refactor

e8aba3f

UAtom repr

1da127e

error_components doctest

17bc5a7

doctests

34701c2

more documentation

5c309d4

don't document Variable

79e00d0

add tuples instead of nesting for addition. Will this be faster?

9607a0d

jagerber48 mentioned this pull request Dec 31, 2024

Problem with special casing std_dev = 0 #283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate the `Variable` class #262

Eliminate the `Variable` class #262

jagerber48 commented Aug 12, 2024

jagerber48 Nov 16, 2024

jagerber48 commented Dec 7, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

jagerber48 commented Dec 13, 2024 •

edited

Loading

andrewgsavage Dec 26, 2024

jagerber48 Dec 26, 2024

codspeed-hq bot commented Dec 28, 2024 •

edited

Loading

jagerber48 commented Dec 28, 2024

Eliminate the Variable class #262

Are you sure you want to change the base?

Eliminate the Variable class #262

Conversation

jagerber48 commented Aug 12, 2024

jagerber48 Nov 16, 2024

Choose a reason for hiding this comment

jagerber48 commented Dec 7, 2024

codecov bot commented Dec 12, 2024 • edited Loading

Codecov Report

jagerber48 commented Dec 13, 2024 • edited Loading

andrewgsavage Dec 26, 2024

Choose a reason for hiding this comment

jagerber48 Dec 26, 2024

Choose a reason for hiding this comment

codspeed-hq bot commented Dec 28, 2024 • edited Loading

CodSpeed Performance Report

Merging #262 will degrade performances by 35.86%

Summary

Benchmarks breakdown

jagerber48 commented Dec 28, 2024

Eliminate the `Variable` class #262

Eliminate the `Variable` class #262

codecov bot commented Dec 12, 2024 •

edited

Loading

jagerber48 commented Dec 13, 2024 •

edited

Loading

codspeed-hq bot commented Dec 28, 2024 •

edited

Loading