Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mean_and_var and mean_and_std default value for "corrected" in source does not agree with documentation #604

Open
zierenberg opened this issue Oct 1, 2020 · 7 comments
Labels
breaking A breaking change that needs a semver major release bug

Comments

@zierenberg
Copy link

Let me start with a big thank you for the nice package.

I have noticed that there is a mismatch between the documentation and implementation of the default values for the "corrected" flag in mean_and_var and mean_and_std. In the documentation it says

mean_and_std(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, std)

but everywhere in the code (whenever corrected::Bool is explicitly set which is somehow only for mean_and_std and mean_and_var) it is set to true, e.g.,

function mean_and_std(A::RealArray; corrected::Bool=true)
    m = mean(A)
    s = stdm(A, m; corrected=corrected)
    m, s
end

I would have started to change it, but I was unsure which one you agreed on to be the proper default case. After finding depcheck in common.jl I was even more unsure. Probably those function that set corrected explicitly should be updated to use depcheck?

@nalimilan
Copy link
Member

Good catch. The right value is corrected=true, like in Statistics.std. Would you make a pull request to fix the documentation?

@zierenberg
Copy link
Author

Hi, I would be happy to try but let me clarify some things first:

Also the Statistics.std (in StatsBase moments.jl line 121) is currently documented with corrected=true:

"""
    std(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)
[...]
"""
std(v::RealArray, w::AbstractWeights; mean=nothing, corrected::DepBool=nothing) =
    sqrt.(var(v, w; mean=mean, corrected=depcheck(:std, corrected)))

from which I understand that the current implementation would call depcheck, which in turn (defined in common.jl) would give a warning that the current implementation is corrected=false but will be changed in future to corrected=true

function depcheck(fname::Symbol, b::DepBool)
    if b == nothing
        msg = "$fname will default to corrected=true in the future. Use corrected=false for previous behaviour."
        Base.depwarn(msg, fname)
        false
    else
        b
    end
end

But this does not happen when I do a simple test like

julia>using Random; using StatsBase;
julia>test=rand(100);
julia>StatsBase.std(test)==StatsBase.std(test, corrected=true)
true

while

julia> StatsBase.depcheck(:std, nothing)
┌ Warning: std will default to corrected=true in the future. Use corrected=false for previous behaviour.
│   caller = ip:0x0
└ @ Core :-1
false

So I naively would have thought that mean_and_std should have the same default as std such that I would also have changed the explicit default with the depcheck version of std, but I do not understand the behavior above.

Can you explain this to me? If everything is correct I would work on the pull request (which I do not have a lot of experience with, so it takes a while for me, but I am eager to learn) and would also use the depcheck method for consistency, correct?.

@nalimilan
Copy link
Member

AFAICT you're confusing std(x::AbstractArray), which lives in Statistics, with std(x::AbstractArray, w::AbstractWeights, which lives in StatsBase. The former uses corrected=true by default, but the latter uses corrected=nothing, which is equivalent to false but with a warning. Same for mean_and_std. I think that historically the weighted methods didn't apply the correction because it wasn't supported due to the lack of specific weights vectors that allow for it. So we added the deprecation to make weighted and unweighted methods consistent in the long run.

The docstring for mean_and_var and mean_and_std are wrong because they do not distinguish these. The best fix would be to remove these deprecations, which have been there for a very long time. Though that will require a breaking release.

@zierenberg
Copy link
Author

zierenberg commented Oct 5, 2020

You are right, I did confuse these two. Thanks for clarifying!

Concerning your last point, however, I wonder if it would be more consistent to make mean_and_var consistent with StatsBase.var and add the deprecation check. Then all of them have the same consistent behavior (yet inconsistent with Statistics) but emphasize the need for the breaking release.

Otherwise, I will try to just fix the documentation string tonight.

@nalimilan
Copy link
Member

Concerning your last point, however, I wonder if it would be more consistent to make mean_and_var consistent with StatsBase.var and add the deprecation check. Then all of them have the same consistent behavior (yet inconsistent with Statistics) but emphasize the need for the breaking release.

What do you mean? mean_and_var(x) is already consistent with var(x) (from Statistics) since it uses corrected=true; and mean_and_var(x, w) has the depcheck just like var(x, w).

@zierenberg
Copy link
Author

Okay, now I start to understand the logic here. I thought that everything within StatsBase should be consistent. If consistency with Statistics is the goal, I will try to create a pull request for the documentation tonight .

Thanks for bearing with me and explaining!

@zierenberg
Copy link
Author

Here I am again, I am more confused than ever. Nevertheless to get some practice I created a pull request to demonstrate the arising inconsistency within StatsBase that would arise from making the one dispatch consistent with Statistics.std and the other with StatsBase.std(x,w).

I hope I did not make too many mistakes along the way. I am still trying to learn the right procedures with github and julia. Thanks for letting me try this out.

@nalimilan nalimilan added the breaking A breaking change that needs a semver major release label Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking A breaking change that needs a semver major release bug
Projects
None yet
Development

No branches or pull requests

3 participants