Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add signrank distribution #173

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ArnoStrouwen
Copy link

Implementation of the signrank distribution.
The cdf is heavily optimized, since that is what is needed for the popular hypothesis test associated with this distribution.

The testing in this package seems quite involved, I could use some pointers on where to add the tests.

using StatsFuns
using Rmath

signrankpdf.(-2:12,4)
dsignrank.(-2:12,4,false)

signranklogpdf.(-2:12,4)
dsignrank.(-2:12,4,true)

signrankcdf.(-2:12,4)
psignrank.(-2:12,4,true,false)

signranklogcdf.(-2:12,4)
psignrank.(-2:12,4,true,true)

signrankccdf.(-2:12,4)
psignrank.(-2:12,4,false,false)

signranklogccdf.(-2:12,4)
psignrank.(-2:12,4,false,true)

signrankinvcdf.(-0.01:0.01:1.01,4)
qsignrank.(-0.01:0.01:1.01,4,true,false)
signrankinvcdf.(0.0624,4)
signrankinvcdf.(0.0625,4)
signrankinvcdf.(0.0626,4)
qsignrank.(0.0624,4,true,false)
qsignrank.(0.0625,4,true,false)
qsignrank.(0.0626,4,true,false)

signrankinvlogcdf.(log.(0.0:0.01:1.01),4)
qsignrank.(log.(0.0:0.01:1.01),4,true,true) # 0.0 does not match, is the R definition correct?
qsignrank.(log(0.0),4,true,true)
qsignrank.(0.0,4,true,false) 
signrankinvlogcdf.(log(0.0624),4)
signrankinvlogcdf.(log(0.0625),4)
signrankinvlogcdf.(log(0.0626),4)
qsignrank.(log(0.0624),4,true,true)
qsignrank.(log(0.0625),4,true,true)
qsignrank.(log(0.0626),4,true,true)


signrankinvccdf.(-0.01:0.01:1.01,4)
qsignrank.(-0.01:0.01:1.01,4,false,false) 
signrankinvccdf.(0.0624,4)
signrankinvccdf.(0.0625,4)
signrankinvccdf.(0.0626,4)
qsignrank.(0.0624,4,false,false)
qsignrank.(0.0625,4,false,false)
qsignrank.(0.0626,4,false,false)

signrankinvlogccdf.(log.(0.0:0.01:1.01),4)
qsignrank.(log.(0.0:0.01:1.01),4,false,true) # 0.0 does not match
signrankinvlogccdf.(log(0.0624),4)
signrankinvlogccdf.(log(0.0625),4)
signrankinvlogccdf.(log(0.0626),4)
qsignrank.(log(0.0624),4,false,true)
qsignrank.(log(0.0625),4,false,true)
qsignrank.(log(0.0626),4,false,true)

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 97.40260% with 2 lines in your changes missing coverage. Please review.

Project coverage is 66.71%. Comparing base (8f50565) to head (7b8c9ee).

Files with missing lines Patch % Lines
src/distrs/signrank.jl 97.40% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #173      +/-   ##
==========================================
+ Coverage   62.99%   66.71%   +3.72%     
==========================================
  Files          14       15       +1     
  Lines         635      712      +77     
==========================================
+ Hits          400      475      +75     
- Misses        235      237       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andreasnoack
Copy link
Member

I'd try to add it to test/rmath.jl` similarly to

StatsFuns.jl/test/rmath.jl

Lines 233 to 241 in 8f50565

rmathcomp_tests("chisq", [
((1,), 0.0:0.1:8.0),
((4,), 0.0:0.1:8.0),
((9,), 0.0:0.1:8.0),
((9,), 0:8),
((1,), 0f0:0.1f0:8f0),
((1,), Float16(0):Float16(0.1):Float16(8)),
((9,), 0//1:8//1),
])
to compare with the Rmath results.

@ArnoStrouwen
Copy link
Author

Some differences with R remain:
How minus infinity is handled in invlogcdf, in my opinion is it correct to return zero here and not NaN.

julia> signrankinvcdf(4, 0.0)
0.0
julia> signrankinvlogcdf(4, log(0))
0.0
julia> qsignrank.(0.0,4,true,false) 
0.0
julia> qsignrank.(log(0.0),4,true,true)
NaN

Rounding in the quantiles is also an issue:

julia> qsignrank.(-2.1,4,true,true)
1.0
julia> qsignrank.(-2.0794415416798357,4,true,true)
1.0
julia> qsignrank.(-2.0,4,true,true)
2.0

julia> signrankinvlogcdf(4,-2.1)
1.0
julia> signrankinvlogcdf(4,-2.0794415416798357)
2.0
julia> signrankinvlogcdf(4,-2.0)
2.0

The true cdf jumps at 0.125, but that value does not roundtrip with exp/log.

julia> exp(-2.0794415416798357)
0.12500000000000003
julia> log(0.125)
-2.0794415416798357
julia> exp(log(0.125))
0.12500000000000003

@ArnoStrouwen
Copy link
Author

ArnoStrouwen commented Feb 23, 2025

I improved rounding by getting rid of all exp in my code.
Still, some rounding differences with R remain for invlogccdf, which I don't know how to solve.
https://github.com/JuliaStats/Rmath-julia/blob/6f2d37ff112914d65559bc3e0035b325c11cf361/src/dpq.h#L52-L53

@ArnoStrouwen
Copy link
Author

ArnoStrouwen commented Feb 27, 2025

Stuck on this: for higher n, the R version seems to behave weirdly?

julia> psignrank(1,4,true,false)
0.125
julia> qsignrank(0.125,4,true,false)
1.0
julia> signrankinvcdf(4,0.125)
1.0

julia> psignrank(1,50,true,false)
1.776356839400249e-15
julia> qsignrank(1.776356839400249e-15,50,true,false)
0.0
julia> signrankinvcdf(50,1.776356839400249e-15)
1.0
julia> qsignrank.(psignrank.(-1:8,4,true,false),4,true,false)
10-element Vector{Float64}:
 0.0
 0.0
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
 7.0
 8.0

julia> qsignrank.(psignrank.(-1:8,50,true,false),50,true,false)
10-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 2.0
 3.0
 5.0
 6.0
 7.0
 8.0

Our own version does not have this issue:

julia> signrankinvcdf.(50,signrankcdf.(50,-1:8))
10-element Vector{Float64}:
 0.0
 0.0
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
 7.0
 8.0

@andreasnoack
Copy link
Member

Would be good to file a bug report in the R bug tracker https://bugs.r-project.org/. Here you can avoid testing these cases against R for now and instead test some invariants such that the round tripping of p and q.

@ArnoStrouwen ArnoStrouwen marked this pull request as ready for review February 28, 2025 00:39
@ArnoStrouwen
Copy link
Author

Seems like windows/mac have slightly different rounding going on somewhere?

For the older julia versions, what would be the best way to round to nearest integer?

test/rmath.jl Outdated
Comment on lines 386 to 394
@test signrankinvcdf.(10, signrankcdf.(10, -1:56)) ≈ [0; 0:55; 55] atol = 1e-12 rtol = 1e-12
@test signrankinvccdf.(10, signrankccdf.(10, -1:56)) ≈ [0; 0:55; 55] atol = 1e-12 rtol = 1e-12
@test signrankinvlogcdf.(10, signranklogcdf.(10, -1:56)) ≈ [NaN; 0:55; 55] nans = true atol = 1e-12 rtol = 1e-12
@test signrankinvlogccdf.(10, signranklogccdf.(10, -1:56)) ≈ [0; 0:54; NaN; NaN] nans = true atol = 1e-12 rtol = 1e-12

@test signrankinvcdf.(50, signrankcdf.(50, -1:1276)) ≈ [0; 0:1275; 1275] atol = 1e-12 rtol = 1e-12
@test signrankinvccdf.(50, signrankccdf.(50, -1:1276)) ≈ [0; 0:1275; 1275] atol = 1e-12 rtol = 1e-12
@test signrankinvlogcdf.(50, signranklogcdf.(50, -1:1276)) ≈ [NaN; 0:1275; 1275] nans = true atol = 1e-12 rtol = 1e-12
@test signrankinvlogccdf.(50, signranklogccdf.(50, -1:1276)) ≈ [0; 0:1274; NaN; NaN] nans = true atol = 1e-12 rtol = 1e-12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to compare evaluations for each value separately to avoid that differences are masked by some large values. All existing tests seem to loop over the desired inputs.

@ArnoStrouwen
Copy link
Author

This failure is unique to ubuntu:
https://github.com/JuliaStats/StatsFuns.jl/actions/runs/13596765039/job/38015305607?pr=173#step:6:1507
Not present on my windows machine or the windows testing machine.
Can somebody with linux verify?

@ArnoStrouwen
Copy link
Author

This is what is causing test failures:

julia> psignrank(18,10,false,true) # windows
-0.2076393647782445
julia> psignrank(18,10,false,true) # linux
-0.20763936477824452

@ArnoStrouwen ArnoStrouwen requested a review from devmotion March 9, 2025 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants