Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new ufloat_from_sample function #276

Open
Myles244 opened this issue Dec 16, 2024 · 7 comments · May be fixed by #277
Open

new ufloat_from_sample function #276

Myles244 opened this issue Dec 16, 2024 · 7 comments · May be fixed by #277

Comments

@Myles244
Copy link

Hello, I often use this package to analyse data from experiments, and I usually have a list of measurements of a variable. These measurements are usually normally distributed. So I combine them using the mean as the true value and the sample standard deviation divided by the square root of the number of measurements as the standard deviation of the mean. Something like this:
value=ufloat(numpy.mean(sample),numpy.std(sample,ddof=1)/numpy.sqrt(len(sample)))

In an attempt to save myself and hopefully a few others some time, would it be suitable to add a function uncertainties.unumpy.ufloat_from_sample()

ps I'm new to contributing to open source so if I've made any faux pas, I'm sorry.

@Myles244 Myles244 linked a pull request Dec 16, 2024 that will close this issue
3 tasks
@jagerber48
Copy link
Contributor

I'm not sure if this should be added to uncertainties or not. I see that it is convenient, and I would indeed use it. But I don't know that uncertainties is the spot to hold these kinds of convenience functions.

It's also important to note that this function only really works if the sample data are normally distributed. It could and would get used for non-normally distributed data and might lead to wrong or at least biased conclusions. At the very least this caution needs to be documented and maybe appear in the function name or something.

val = np.mean(sample)
err = np.std(sample, ddof=1) / np.sqrt(len(sample)
# or
# err = scipy.stats.sem(sample)
u_val = ufloat(val, err)

compared to

ufloat_from_sample(sample)

Is this convenience function really needed as more code to maintain in uncertainties?

My concerns are that there could be a slippery slope towards adding many such convenience functions, increasing the maintenance burden.
And then also if for some reason we decide we don't want such functions in uncertainties it will be another burden to remove them.

I would consider myself -0 on this right now.

@newville
Copy link
Member

@Myles244 @jagerber48 I agree with @jagerber48.

This function would provide one of potentially many approaches to converting a collection of values into a ufloat. It is kind of a "one-liner".

I would not be opposed to having a module that supported such conversions of values to ufloats. I'm not sure where that should live, but I think that it should not be in unumpy.core.

@andrewgsavage
Copy link
Contributor

I see that it is convenient, and I would indeed use it.

I think that's enough of a reason to include it in uncertaintes.
The normal distribution assumption is documented in the docstring and documentation.

My concerns are that there could be a slippery slope towards adding many such convenience functions, increasing the maintenance burden. And then also if for some reason we decide we don't want such functions in uncertainties it will be another burden to remove them.

In which case we should decide where such functions should go. I agree unumpy.core does not feel right for this function, I'd suggest it could be a constructor for UFloat, eg UFloat.from_sample(...) Alternatively another module, perhaps uncertainties.util?
In general I think functions that return uncertainties objects should live inside uncertainties.

@Myles244
Copy link
Author

Myles244 commented Dec 20, 2024

I'm not sure where that should live

I originally included the function in unumpy.core because of the function's dependence on the numpy library.

At the very least this caution needs to be documented and maybe appear in the function name or something

I agree the assumptions of the function could be more explicit, perhaps ufoat_from_gaussian_sample()
or UFloat.from_gaussian_sample(). I expect adding in a check, e.g. a chi-squared test, and returning a warning if it fails, is beyond the scope of this library.

This function would provide one of potentially many approaches to converting a collection of values into a ufloat.

If it would be better for the function to be more general, then ufloat_from_sample could include an optional argument, 'method=gaussian', that could be used to specify alternative approaches. method could even be a required argument to force the user to see the assumption.

@newville
Copy link
Member

I like a plain function named ufloat_from_samples (or ufloat_from_sample), though Ufloat.from_samples would be okay too. I also like the idea of optional methods, defaulting to Gaussian, though supporting the options might be some work.....

My main concern would be what inputs we would be supporting. Uncertainties does not require Numpy, so it should handle a plain list of numbers without Numpy or Scipy installed, and "no Numpy installed" should be handled gracefully. I imagine someone will expect it to "just work" with a Pandas Series, too. And maybe xarray, and others.

Maybe that can be handled with: if Numpy is available and the value is not a list or a ndarray, it should assume that the object has a to_numpy() method and use that?

Also: if someone gives a multi-dimensional array, should that take an axis argument to sample along that axis?

Really, not trying to make it more complicated, just pointing out the inherent complications of the idea ;).

@andrewgsavage
Copy link
Contributor

andrewgsavage commented Dec 21, 2024 via email

@Myles244
Copy link
Author

Myles244 commented Dec 22, 2024

I am unsure of the proper place for this function. So I have moved into uncertainties.core, right next to ufloat_fromstr(), but if instead it should be a constructor of ufloat, this should be a relatively easy change.

I have now made the following changes to the pull request:

-The function ufloat_from_sample() is now in uncertainties.core
-The function works without numpy but can only handle 1-D lists. This uses Python's statistics package.
-With numpy, the function can handle n-D arrays using the optional argument axis=None
-The function now has an optional argument method='gaussian'
-Updated the tests
-Updated docs and changes files

I ran into some circular import errors while trying to generate uarrays when handling n-D arrays, so I duplicated some of the code inside unumpy.uarray definition. If there is a better solution, please let me know.

If there are any other common ways to extract ufloats from a sample of measurements, I'm happy to add them as options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants