new ufloat_from_sample function #276

Myles244 · 2024-12-16T17:38:25Z

Hello, I often use this package to analyse data from experiments, and I usually have a list of measurements of a variable. These measurements are usually normally distributed. So I combine them using the mean as the true value and the sample standard deviation divided by the square root of the number of measurements as the standard deviation of the mean. Something like this:
value=ufloat(numpy.mean(sample),numpy.std(sample,ddof=1)/numpy.sqrt(len(sample)))

In an attempt to save myself and hopefully a few others some time, would it be suitable to add a function uncertainties.unumpy.ufloat_from_sample()

ps I'm new to contributing to open source so if I've made any faux pas, I'm sorry.

The text was updated successfully, but these errors were encountered:

jagerber48 · 2024-12-20T03:40:51Z

I'm not sure if this should be added to uncertainties or not. I see that it is convenient, and I would indeed use it. But I don't know that uncertainties is the spot to hold these kinds of convenience functions.

It's also important to note that this function only really works if the sample data are normally distributed. It could and would get used for non-normally distributed data and might lead to wrong or at least biased conclusions. At the very least this caution needs to be documented and maybe appear in the function name or something.

val = np.mean(sample)
err = np.std(sample, ddof=1) / np.sqrt(len(sample)
# or
# err = scipy.stats.sem(sample)
u_val = ufloat(val, err)

compared to

ufloat_from_sample(sample)

Is this convenience function really needed as more code to maintain in uncertainties?

My concerns are that there could be a slippery slope towards adding many such convenience functions, increasing the maintenance burden.
And then also if for some reason we decide we don't want such functions in uncertainties it will be another burden to remove them.

I would consider myself -0 on this right now.

newville · 2024-12-20T04:07:44Z

@Myles244 @jagerber48 I agree with @jagerber48.

This function would provide one of potentially many approaches to converting a collection of values into a ufloat. It is kind of a "one-liner".

I would not be opposed to having a module that supported such conversions of values to ufloats. I'm not sure where that should live, but I think that it should not be in unumpy.core.

andrewgsavage · 2024-12-20T13:46:59Z

I see that it is convenient, and I would indeed use it.

I think that's enough of a reason to include it in uncertaintes.
The normal distribution assumption is documented in the docstring and documentation.

My concerns are that there could be a slippery slope towards adding many such convenience functions, increasing the maintenance burden. And then also if for some reason we decide we don't want such functions in uncertainties it will be another burden to remove them.

In which case we should decide where such functions should go. I agree unumpy.core does not feel right for this function, I'd suggest it could be a constructor for UFloat, eg UFloat.from_sample(...) Alternatively another module, perhaps uncertainties.util?
In general I think functions that return uncertainties objects should live inside uncertainties.

Myles244 · 2024-12-20T16:10:20Z

I'm not sure where that should live

I originally included the function in unumpy.core because of the function's dependence on the numpy library.

At the very least this caution needs to be documented and maybe appear in the function name or something

I agree the assumptions of the function could be more explicit, perhaps ufoat_from_gaussian_sample()
or UFloat.from_gaussian_sample(). I expect adding in a check, e.g. a chi-squared test, and returning a warning if it fails, is beyond the scope of this library.

This function would provide one of potentially many approaches to converting a collection of values into a ufloat.

If it would be better for the function to be more general, then ufloat_from_sample could include an optional argument, 'method=gaussian', that could be used to specify alternative approaches. method could even be a required argument to force the user to see the assumption.

newville · 2024-12-20T19:05:53Z

I like a plain function named ufloat_from_samples (or ufloat_from_sample), though Ufloat.from_samples would be okay too. I also like the idea of optional methods, defaulting to Gaussian, though supporting the options might be some work.....

My main concern would be what inputs we would be supporting. Uncertainties does not require Numpy, so it should handle a plain list of numbers without Numpy or Scipy installed, and "no Numpy installed" should be handled gracefully. I imagine someone will expect it to "just work" with a Pandas Series, too. And maybe xarray, and others.

Maybe that can be handled with: if Numpy is available and the value is not a list or a ndarray, it should assume that the object has a to_numpy() method and use that?

Also: if someone gives a multi-dimensional array, should that take an axis argument to sample along that axis?

Really, not trying to make it more complicated, just pointing out the inherent complications of the idea ;).

andrewgsavage · 2024-12-21T11:26:14Z

Yea the optional method is a nice idea. Where would you place the function `ufloat_from_sample`? The PR uses numpy but could be changed to use math or raise an error if numpy isn't available. I suspect using numpy allows this function to work for pandas or xarray. Most objects from other libraries have support for numpy so `np.mean(obj)` returns a float. Writing functions to accept *any* input from the get go is unreasonable. People are free to submit PRs adding support if they want support for a library. I think it's reasonable PRs should work with lists and numpy arrays at a minimum.

…

On Fri, 20 Dec 2024, 19:06 Matt Newville, ***@***.***> wrote: I like a plain function named ufloat_from_samples (or ufloat_from_sample), though Ufloat.from_samples would be okay too. I also like the idea of optional methods, defaulting to Gaussian, though supporting the options might be some work..... My main concern would be what inputs we would be supporting. Uncertainties does not require Numpy, so it should handle a plain list of numbers without Numpy or Scipy installed, and "no Numpy installed" should be handled gracefully. I imagine someone will expect it to "just work" with a Pandas Series, too. And maybe xarray, and others. Maybe that can be handled with: if Numpy is available and the value is not a list or a ndarray, it should assume that the object has a to_numpy() method and use that? Also: if someone gives a multi-dimensional array, should that take an axis argument to sample along that axis? Really, not trying to make it more complicated, just pointing out the inherent complications of the idea ;). — Reply to this email directly, view it on GitHub <#276 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADEMLECFBFXTIUGAY7BOLQT2GRTCPAVCNFSM6AAAAABTWVACMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJXGU3TANBTGI> . You are receiving this because you commented.Message ID: ***@***.***>

Myles244 · 2024-12-22T00:05:01Z

I am unsure of the proper place for this function. So I have moved into uncertainties.core, right next to ufloat_fromstr(), but if instead it should be a constructor of ufloat, this should be a relatively easy change.

I have now made the following changes to the pull request:

-The function ufloat_from_sample() is now in uncertainties.core
-The function works without numpy but can only handle 1-D lists. This uses Python's statistics package.
-With numpy, the function can handle n-D arrays using the optional argument axis=None
-The function now has an optional argument method='gaussian'
-Updated the tests
-Updated docs and changes files

I ran into some circular import errors while trying to generate uarrays when handling n-D arrays, so I duplicated some of the code inside unumpy.uarray definition. If there is a better solution, please let me know.

If there are any other common ways to extract ufloats from a sample of measurements, I'm happy to add them as options.

Myles244 linked a pull request Dec 16, 2024 that will close this issue

Added ufloat_from_sample function #277

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new ufloat_from_sample function #276

new ufloat_from_sample function #276

Myles244 commented Dec 16, 2024

jagerber48 commented Dec 20, 2024

newville commented Dec 20, 2024

andrewgsavage commented Dec 20, 2024

Myles244 commented Dec 20, 2024 •

edited

Loading

newville commented Dec 20, 2024

andrewgsavage commented Dec 21, 2024 via email

Myles244 commented Dec 22, 2024 •

edited

Loading

new ufloat_from_sample function #276

new ufloat_from_sample function #276

Comments

Myles244 commented Dec 16, 2024

jagerber48 commented Dec 20, 2024

newville commented Dec 20, 2024

andrewgsavage commented Dec 20, 2024

Myles244 commented Dec 20, 2024 • edited Loading

newville commented Dec 20, 2024

andrewgsavage commented Dec 21, 2024 via email

Myles244 commented Dec 22, 2024 • edited Loading

Myles244 commented Dec 20, 2024 •

edited

Loading

Myles244 commented Dec 22, 2024 •

edited

Loading