add signrank distribution #173

ArnoStrouwen · 2025-02-19T16:44:09Z

Implementation of the signrank distribution.
The cdf is heavily optimized, since that is what is needed for the popular hypothesis test associated with this distribution.

The testing in this package seems quite involved, I could use some pointers on where to add the tests.

using StatsFuns
using Rmath

signrankpdf.(-2:12,4)
dsignrank.(-2:12,4,false)

signranklogpdf.(-2:12,4)
dsignrank.(-2:12,4,true)

signrankcdf.(-2:12,4)
psignrank.(-2:12,4,true,false)

signranklogcdf.(-2:12,4)
psignrank.(-2:12,4,true,true)

signrankccdf.(-2:12,4)
psignrank.(-2:12,4,false,false)

signranklogccdf.(-2:12,4)
psignrank.(-2:12,4,false,true)

signrankinvcdf.(-0.01:0.01:1.01,4)
qsignrank.(-0.01:0.01:1.01,4,true,false)
signrankinvcdf.(0.0624,4)
signrankinvcdf.(0.0625,4)
signrankinvcdf.(0.0626,4)
qsignrank.(0.0624,4,true,false)
qsignrank.(0.0625,4,true,false)
qsignrank.(0.0626,4,true,false)

signrankinvlogcdf.(log.(0.0:0.01:1.01),4)
qsignrank.(log.(0.0:0.01:1.01),4,true,true) # 0.0 does not match, is the R definition correct?
qsignrank.(log(0.0),4,true,true)
qsignrank.(0.0,4,true,false) 
signrankinvlogcdf.(log(0.0624),4)
signrankinvlogcdf.(log(0.0625),4)
signrankinvlogcdf.(log(0.0626),4)
qsignrank.(log(0.0624),4,true,true)
qsignrank.(log(0.0625),4,true,true)
qsignrank.(log(0.0626),4,true,true)


signrankinvccdf.(-0.01:0.01:1.01,4)
qsignrank.(-0.01:0.01:1.01,4,false,false) 
signrankinvccdf.(0.0624,4)
signrankinvccdf.(0.0625,4)
signrankinvccdf.(0.0626,4)
qsignrank.(0.0624,4,false,false)
qsignrank.(0.0625,4,false,false)
qsignrank.(0.0626,4,false,false)

signrankinvlogccdf.(log.(0.0:0.01:1.01),4)
qsignrank.(log.(0.0:0.01:1.01),4,false,true) # 0.0 does not match
signrankinvlogccdf.(log(0.0624),4)
signrankinvlogccdf.(log(0.0625),4)
signrankinvlogccdf.(log(0.0626),4)
qsignrank.(log(0.0624),4,false,true)
qsignrank.(log(0.0625),4,false,true)
qsignrank.(log(0.0626),4,false,true)

codecov-commenter · 2025-02-19T16:49:20Z

Codecov Report

Attention: Patch coverage is 97.40260% with 2 lines in your changes missing coverage. Please review.

Project coverage is 66.71%. Comparing base (8f50565) to head (7b8c9ee).

Files with missing lines	Patch %	Lines
src/distrs/signrank.jl	97.40%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #173      +/-   ##
==========================================
+ Coverage   62.99%   66.71%   +3.72%     
==========================================
  Files          14       15       +1     
  Lines         635      712      +77     
==========================================
+ Hits          400      475      +75     
- Misses        235      237       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andreasnoack · 2025-02-20T11:35:55Z

I'd try to add it to test/rmath.jl` similarly to

StatsFuns.jl/test/rmath.jl

Lines 233 to 241 in 8f50565

    
           rmathcomp_tests("chisq", [ 
        
               ((1,), 0.0:0.1:8.0), 
        
               ((4,), 0.0:0.1:8.0), 
        
               ((9,), 0.0:0.1:8.0), 
        
               ((9,), 0:8), 
        
               ((1,), 0f0:0.1f0:8f0), 
        
               ((1,), Float16(0):Float16(0.1):Float16(8)), 
        
               ((9,), 0//1:8//1), 
        
           ])

to compare with the Rmath results.

ArnoStrouwen · 2025-02-20T15:46:49Z

Some differences with R remain:
How minus infinity is handled in invlogcdf, in my opinion is it correct to return zero here and not NaN.

julia> signrankinvcdf(4, 0.0)
0.0
julia> signrankinvlogcdf(4, log(0))
0.0
julia> qsignrank.(0.0,4,true,false) 
0.0
julia> qsignrank.(log(0.0),4,true,true)
NaN

Rounding in the quantiles is also an issue:

julia> qsignrank.(-2.1,4,true,true)
1.0
julia> qsignrank.(-2.0794415416798357,4,true,true)
1.0
julia> qsignrank.(-2.0,4,true,true)
2.0

julia> signrankinvlogcdf(4,-2.1)
1.0
julia> signrankinvlogcdf(4,-2.0794415416798357)
2.0
julia> signrankinvlogcdf(4,-2.0)
2.0

The true cdf jumps at 0.125, but that value does not roundtrip with exp/log.

julia> exp(-2.0794415416798357)
0.12500000000000003
julia> log(0.125)
-2.0794415416798357
julia> exp(log(0.125))
0.12500000000000003

ArnoStrouwen · 2025-02-23T19:21:04Z

I improved rounding by getting rid of all exp in my code.
Still, some rounding differences with R remain for invlogccdf, which I don't know how to solve.
https://github.com/JuliaStats/Rmath-julia/blob/6f2d37ff112914d65559bc3e0035b325c11cf361/src/dpq.h#L52-L53

src/distrs/signrank.jl

ArnoStrouwen · 2025-02-27T16:50:27Z

Stuck on this: for higher n, the R version seems to behave weirdly?

julia> psignrank(1,4,true,false)
0.125
julia> qsignrank(0.125,4,true,false)
1.0
julia> signrankinvcdf(4,0.125)
1.0

julia> psignrank(1,50,true,false)
1.776356839400249e-15
julia> qsignrank(1.776356839400249e-15,50,true,false)
0.0
julia> signrankinvcdf(50,1.776356839400249e-15)
1.0

julia> qsignrank.(psignrank.(-1:8,4,true,false),4,true,false)
10-element Vector{Float64}:
 0.0
 0.0
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
 7.0
 8.0

julia> qsignrank.(psignrank.(-1:8,50,true,false),50,true,false)
10-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 2.0
 3.0
 5.0
 6.0
 7.0
 8.0

Our own version does not have this issue:

julia> signrankinvcdf.(50,signrankcdf.(50,-1:8))
10-element Vector{Float64}:
 0.0
 0.0
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
 7.0
 8.0

andreasnoack · 2025-02-27T19:42:42Z

Would be good to file a bug report in the R bug tracker https://bugs.r-project.org/. Here you can avoid testing these cases against R for now and instead test some invariants such that the round tripping of p and q.

ArnoStrouwen · 2025-02-28T00:52:44Z

Seems like windows/mac have slightly different rounding going on somewhere?

For the older julia versions, what would be the best way to round to nearest integer?

src/distrs/signrank.jl

test/rmath.jl

devmotion · 2025-02-28T01:55:56Z

test/rmath.jl

+    @test signrankinvcdf.(10, signrankcdf.(10, -1:56)) ≈ [0; 0:55; 55] atol = 1e-12 rtol = 1e-12
+    @test signrankinvccdf.(10, signrankccdf.(10, -1:56)) ≈ [0; 0:55; 55] atol = 1e-12 rtol = 1e-12
+    @test signrankinvlogcdf.(10, signranklogcdf.(10, -1:56)) ≈ [NaN; 0:55; 55] nans = true atol = 1e-12 rtol = 1e-12
+    @test signrankinvlogccdf.(10, signranklogccdf.(10, -1:56)) ≈ [0; 0:54; NaN; NaN] nans = true atol = 1e-12 rtol = 1e-12
+
+    @test signrankinvcdf.(50, signrankcdf.(50, -1:1276)) ≈ [0; 0:1275; 1275] atol = 1e-12 rtol = 1e-12
+    @test signrankinvccdf.(50, signrankccdf.(50, -1:1276)) ≈ [0; 0:1275; 1275] atol = 1e-12 rtol = 1e-12
+    @test signrankinvlogcdf.(50, signranklogcdf.(50, -1:1276)) ≈ [NaN; 0:1275; 1275] nans = true atol = 1e-12 rtol = 1e-12
+    @test signrankinvlogccdf.(50, signranklogccdf.(50, -1:1276)) ≈ [0; 0:1274; NaN; NaN] nans = true atol = 1e-12 rtol = 1e-12


I think it would be good to compare evaluations for each value separately to avoid that differences are masked by some large values. All existing tests seem to loop over the desired inputs.

ArnoStrouwen · 2025-02-28T21:33:48Z

This failure is unique to ubuntu:
https://github.com/JuliaStats/StatsFuns.jl/actions/runs/13596765039/job/38015305607?pr=173#step:6:1507
Not present on my windows machine or the windows testing machine.
Can somebody with linux verify?

ArnoStrouwen · 2025-03-06T05:03:17Z

This is what is causing test failures:

julia> psignrank(18,10,false,true) # windows
-0.2076393647782445
julia> psignrank(18,10,false,true) # linux
-0.20763936477824452

ArnoStrouwen force-pushed the signrank branch from e63bf83 to fb4cedc Compare February 20, 2025 15:35

andreasnoack reviewed Feb 23, 2025

View reviewed changes

src/distrs/signrank.jl Outdated Show resolved Hide resolved

add signrank distribution

166d57f

ArnoStrouwen force-pushed the signrank branch from e3e0ed4 to 166d57f Compare February 28, 2025 00:39

ArnoStrouwen requested a review from andreasnoack February 28, 2025 00:39

ArnoStrouwen marked this pull request as ready for review February 28, 2025 00:39

devmotion reviewed Feb 28, 2025

View reviewed changes

refactor signrank

64f2263

ArnoStrouwen force-pushed the signrank branch from 2be8bbe to 64f2263 Compare February 28, 2025 22:47

handle R system variation signrank

7b8c9ee

ArnoStrouwen requested a review from devmotion March 9, 2025 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add signrank distribution #173

add signrank distribution #173

ArnoStrouwen commented Feb 19, 2025

codecov-commenter commented Feb 19, 2025 •

edited

Loading

andreasnoack commented Feb 20, 2025

ArnoStrouwen commented Feb 20, 2025

ArnoStrouwen commented Feb 23, 2025 •

edited

Loading

ArnoStrouwen commented Feb 27, 2025 •

edited

Loading

andreasnoack commented Feb 27, 2025

ArnoStrouwen commented Feb 28, 2025

devmotion Feb 28, 2025

ArnoStrouwen commented Feb 28, 2025

ArnoStrouwen commented Mar 6, 2025

add signrank distribution #173

Are you sure you want to change the base?

add signrank distribution #173

Conversation

ArnoStrouwen commented Feb 19, 2025

codecov-commenter commented Feb 19, 2025 • edited Loading

Codecov Report

andreasnoack commented Feb 20, 2025

ArnoStrouwen commented Feb 20, 2025

ArnoStrouwen commented Feb 23, 2025 • edited Loading

ArnoStrouwen commented Feb 27, 2025 • edited Loading

andreasnoack commented Feb 27, 2025

ArnoStrouwen commented Feb 28, 2025

devmotion Feb 28, 2025

Choose a reason for hiding this comment

ArnoStrouwen commented Feb 28, 2025

ArnoStrouwen commented Mar 6, 2025

codecov-commenter commented Feb 19, 2025 •

edited

Loading

ArnoStrouwen commented Feb 23, 2025 •

edited

Loading

ArnoStrouwen commented Feb 27, 2025 •

edited

Loading