sumlog #48

cscherrer · 2022-05-02T17:08:10Z

This PR adds sumlog, a more efficient way to compute sum(log, x). There's more discussion on this on Discourse here:
https://discourse.julialang.org/t/sum-of-logs/80370

EDIT:
I think we have a good enough understanding of what's possible to lay out some design criteria. That ought to be more efficient than taking each line of code in isolation.

As a starting point, I suggest

Whenever sum(log ∘ f, x) is defined, sumlog(f, x) should give the same result (within some tolerance, etc)
sumlog(x) == sumlog(identity, x)
sumlog(f, x) should support a dims keyword argument whenever sum(log ∘ f, x) does (i.e., when x is an AbstractArray)
sumlog should be type-stable and compiler-friendly when possible
sumlog should use the optimized method requiring a single log application, whenever that's possible.

@devmotion @mcabbott thoughts on these?

devmotion

Nice, thanks!

I think there are some problems with the current implementation but I made some suggestions that hopefully can fix most of them.

Can you also update the version number and add it to the docs?

src/sumlog.jl

devmotion · 2022-05-02T17:25:26Z

src/sumlog.jl

+
+Since `log(2)` is constant, `sumlog` only requires a single `log` evaluation.
+"""
+function sumlog(x::AbstractArray{T}) where {T} 


It seems this will fail for non-floating point types T such as Int, etc., BigFloat, and for complex numbers?

src/sumlog.jl

test/sumlog.jl

Co-authored-by: David Widmann <[email protected]>

cscherrer · 2022-05-02T18:28:35Z

@devmotion I guess we also need some ChainRules methods...

devmotion · 2022-05-02T18:38:56Z

@devmotion I guess we also need some ChainRules methods...

We could, but it's not necessary to do in this PR IMO.

Apart from the element type (see discussion above), I think the main problem left is that I assume the code is problematic for GPU arrays. Other array implementations in LohExpFunctions are written in a GPU-friendly way and should work with all array types.

cscherrer · 2022-05-02T19:05:34Z

I think the main problem left is that I assume the code is problematic for GPU arrays. Other array implementations in LohExpFunctions are written in a GPU-friendly way and should work with all array types.

Do you see a nice way of doing this?

I see two other potential things to add:

Support for Tuples and NamedTuples
Support for calling sumlog(f, x)

Both less critical, so we can come back to them

cscherrer · 2022-05-02T19:15:20Z

~~One of these updates along the way killed all of the performance. Are you seeing this too? Need to backtrack a bit I guess, and maybe split the preprocessing into a separate function~~

Got it

devmotion

Maybe we can use mapreduce to support more general and in particular GPU arrays (similar to how we use reduce in logsumexp).

The function has to be added to the docs for the tests to pass.

src/sumlog.jl

Co-authored-by: David Widmann <[email protected]>

cscherrer · 2022-05-03T15:15:56Z

It got faster! Check it out:

julia> x = rand(1000);

julia> @btime sum(log, $x)
  6.362 μs (0 allocations: 0 bytes)
-1027.6

julia> @btime sumlog($x)
  986.857 ns (0 allocations: 0 bytes)
-1027.6

cscherrer · 2022-05-03T15:23:52Z

The function has to be added to the docs for the tests to pass.

I don't understand what you mean by this. Docstrings are usually added automatically, what's left to do?

cscherrer · 2022-05-03T15:31:24Z

I changed it to

function sumlog(x)
    T = float(eltype(x))
    _sumlog(T, values(x))
end

There's no need to restrict the type of x, and this allows it to be a Tuple or NamedTuple. For NamedTuples, calling values(x) makes it much faster, and this doesn't affect other types.

devmotion · 2022-05-03T21:34:13Z

What's left to do?

You have to add the function to docs/src/index.md.

devmotion · 2022-05-03T21:40:40Z

test/sumlog.jl

@@ -0,0 +1,7 @@
+@testset "sumlog" begin
+    for T in [Int, Float16, Float32, Float64, BigFloat]


I noticed that you removed the type restriction. Thus we should extend the tests and eg. check more general iterables (also with different types, abstract eltype etc since sum(log, x) would work for them) and also complex numbers.

eltype doesn't work well for Base.Generators. Usually this is when I'd turn to something like

julia> Core.Compiler.return_type(gen.f, Tuple{eltype(gen.iter)}) Float64

We could instead have it fall back on the default, but I'd guess that will sacrifice performance.

I bet you could write an equally fast version which explicitly calls iterate, and widens if the type changes. (But usually the compiler will prove that it won't.)

One reason to keep mapreduce for arrays is that you can give it dims.

Should check more carefully, but this appears to work & is as fast as current version:

function sumlog(x) iter = iterate(x) if isnothing(iter) return eltype(x) <: Number ? zero(float(eltype(x))) : 0.0 end x1 = float(iter[1]) x1 isa AbstractFloat || return sum(log, x) sig, ex = significand(x1), exponent(x1) iter = iterate(x, iter[2]) while iter !== nothing xj = float(iter[1]) x1 isa AbstractFloat || return sum(log, x) # maybe not ideal, re-starts iterator sig, ex = _sumlog_op((sig, ex), (significand(xj), exponent(xj))) iter = iterate(x, iter[2]) end return log(sig) + IrrationalConstants.logtwo * ex end sumlog(f, x) = sumlog(Iterators.map(f, x)) sumlog(f, x, ys...) = sumlog(f(xy...) for xy in zip(x, ys...))

And for dims:

sumlog(x::AbstractArray{T}; dims=:) where T = _sumlog(float(T), dims, x) function _sumlog(::Type{T}, ::Colon, x) where {T<:AbstractFloat} sig, ex = mapreduce(_sumlog_op, x; init=(one(T), zero(exponent(one(T))))) do xj float_xj = float(xj) significand(float_xj), exponent(float_xj) end return log(sig) + IrrationalConstants.logtwo * ex end function _sumlog(::Type{T}, dims, x) where {T<:AbstractFloat} sig_ex = mapreduce(_sumlog_op, x; dims=dims, init=(one(T), zero(exponent(one(T))))) do xj float_xj = float(xj) significand(float_xj), exponent(float_xj) end map(sig_ex) do (sig, ex) log(sig) + IrrationalConstants.logtwo * ex end end

Should I make a PR to the PR?

cscherrer#1 is a tidier version of the above.

devmotion · 2022-05-09T22:44:25Z

I think it might be easier to focus on goals 1 (accuracy and consistency with sum(log, x)), 4 (type stability), 5 (performance), and GPU compatibility first. Supporting sumlog(f, x) and optional dims arguments seems less relevant initially.

Generally, I can see that the function can be useful in some cases but I would like to avoid that code complexity is increased too much in this package, so I think a simple implementation should be another main goal. IMO the code for logsumexp is already quite complex and hence difficult to maintain but probably this is justified by the popularity of this particular function.

cscherrer · 2022-05-09T23:23:33Z

Does goal 1 imply that 0.0 and NaN etc. should propagate as usual?

Ideally, yes. But none of these are requirements in any way. The idea is more that if we start with an idealized wish list, it might be easier to talk about the design space and decide together where to make compromises.

Maybe this is just me, but after ten or so updates I find it too easy to get lost in the weeds. Maybe this can help keep us form going in circles in the discussion.

I don't see how to do sum(log ∘ f, x; dims) without type-inference

I think to start we should focus on the real case. We can come back to complex numbers - maybe this could be a kwarg or optional type parameter, or even a separate function.

Supporting sumlog(f, x) and optional dims arguments seems less relevant initially.

If we use mapreduce, I think we get dims support almost for free, is that right?

Generally, I can see that the function can be useful in some cases but I would like to avoid that code complexity is increased too much in this package, so I think a simple implementation should be another main goal.

I like "as simple as possible, but no simpler". I can understand wanting to avoid Base._return_type, and to lean toward higher-order functions to help with AD. But one concern with simplicity is the potential for others to re-implement to avoid any shortcomings. IMO some degree of complexity is better than simple code no one uses.

Also... We could consider changing this function name to logprod. Note that sum(log, x) ≈ log(prod(x)), with sum(log, x) having a more restricted domain (no negative reals allowed), and log(prod, x) being faster, but much more likely to overflow or underflow.

So if it's easy to set it up so "double negatives cancel", logprod might be a better name.

cscherrer · 2022-05-09T23:46:13Z

This also points to another application - maybe you really want to just compute a product, but you'd like to avoid underflow and overflow. So for example @cjdoris's LogarithmicNumbers.jl would seem to benefit from adding

Base.prod(::Type{ULogarithmic}, x) = exp(ULogarithmic, logprod(x))

devmotion · 2022-05-09T23:52:36Z

avoid Base._return_type

Yes, this is part of the code complexity goal but will also improve stability of LogExpFunctions. All such internal functions and "hacks" should be removed from the PR, in particular since it seems they can be avoided easily. Even standard libraries such as Statistics don't use _return_type to handle empty iterators, see eg https://github.com/JuliaLang/Statistics.jl/blob/cdd95fea3ce7bf31c68e01412548688fbd505903/src/Statistics.jl#L204 and https://github.com/JuliaLang/Statistics.jl/blob/cdd95fea3ce7bf31c68e01412548688fbd505903/src/Statistics.jl#L170.

tpapp · 2022-05-10T07:58:54Z

@cscherrer: Regarding stepping back and agreement: I always think in terms of costs and benefits (code complexity and maintainability vs how useful the code is), and personally I would just go to logs as soon as possible, even at a slight performance cost. But if you really need this, I am fine with including it.

Regarding the goals:

I would rename to logprod, conveys the underlying algorithm and precision trade-offs better,
I think that dims and foo(f, x) are unnecessary, and it is silly that each function replicates the boilerplate for this, given that Julia has much nicer mechanisms now for these, but I understand that whenever we leave these out someone will complain
I would be happy with a robust and reasonably accurate logprod that is approximately sum(log, ...), with the understanding that someones one is better than the other. All algorithms have trade-offs and that's fine. Maybe we should document them though.

cscherrer · 2022-05-10T12:30:41Z

I've pushed another version. This time

It's logprod instead of sumlog
sumlog is still there for now, for easy comparison
I remembered about frexp. @mcabbott this behaves better for subnormals
I dropped dims, etc. I think we have a better understanding now of each other's priorities. I'm still in favor of more functionality, but we can start simple and get tests etc going. That will make it easier to weigh any drawbacks of adding more functionality.
I think this is a good candidate to go in Base. If we make it logabsprod it can be a big help to speed up logabsdet:

julia> x = LowerTriangular(randn(1000,1000));

julia> using LinearAlgebra

julia> using BenchmarkTools

julia> @btime logabsdet($x)
  8.687 μs (0 allocations: 0 bytes)
(-631.836, -1.0)

julia> d = diag(x);

julia> @btime logabsprod($d)
  1.202 μs (0 allocations: 0 bytes)
(-631.836, 1.0)

I cheated here a little, since we don't have (that I know of) a lazy diag in Base.

docs/src/index.md

src/LogExpFunctions.jl

devmotion · 2022-05-10T13:03:08Z

src/logprod.jl

@@ -0,0 +1,78 @@
+"""
+    logprod(X::AbstractArray{T}; dims)


Suggested change

logprod(X::AbstractArray{T}; dims)

logprod(x)

src/logprod.jl

devmotion · 2022-05-10T13:30:32Z

src/logprod.jl

+    x1 = float(iter[1])
+    x1 isa AbstractFloat || return sum(log, x)
+    x1 < 0 && Base.Math.throw_complex_domainerror(:log, x1)
+    sig, ex = significand(x1), _exponent(x1)


Suggested change

sig, ex = significand(x1), _exponent(x1)

sig, ex = frexp(x1)

devmotion · 2022-05-10T13:30:51Z

src/logprod.jl

+    x1 isa AbstractFloat || return sum(log, x)
+    x1 < 0 && Base.Math.throw_complex_domainerror(:log, x1)
+    sig, ex = significand(x1), _exponent(x1)
+    nonfloat = zero(x1)


Suggested change

nonfloat = zero(x1)

devmotion · 2022-05-10T13:31:22Z

src/logprod.jl

+    while iter !== nothing
+        xj = float(iter[1])
+        if xj isa AbstractFloat
+            sig, ex = _logprod_op((sig, ex), (significand(xj), _exponent(xj)))


Suggested change

sig, ex = _logprod_op((sig, ex), (significand(xj), _exponent(xj)))

sig, ex = _logabsprod_op((sig, ex), frexp(xj))

devmotion · 2022-05-10T13:32:02Z

src/logprod.jl

+        if xj isa AbstractFloat
+            sig, ex = _logprod_op((sig, ex), (significand(xj), _exponent(xj)))
+        else
+            nonfloat += log(xj)


Suggested change

nonfloat += log(xj)

y = prod(x)

return log(abs(y)), sign(y)

devmotion · 2022-05-10T13:32:38Z

src/logprod.jl

+        end
+        iter = iterate(x, iter[2])
+    end
+    return log(sig) + IrrationalConstants.logtwo * oftype(sig, ex) + nonfloat


Suggested change

return log(sig) + IrrationalConstants.logtwo * oftype(sig, ex) + nonfloat

return (log(abs(sig)) + IrrationalConstants.logtwo * oftype(sig, ex), sign(sig))

Co-authored-by: David Widmann <[email protected]>

mcabbott · 2022-05-10T14:20:46Z

logprod is a neat idea to avoid checks.

frexp is also clearly what we were looking for. Does this have any effect on speed?

I am lost in all the noise on minor details, but checking sig > floatmax(typeof(sig)) / 2 is now the wrong thing, as it will overflow towards zero.

logprod should really have a case which tries log(prod()) first on small enough arrays, as this is much faster. (And it should advertise itself as being less prone to overflow than log(prod, rather than as being faster.)

Co-authored-by: David Widmann <[email protected]>

cscherrer · 2022-05-10T14:50:27Z

logprod is a neat idea to avoid checks.

Thanks! I think the name is more natural too :)

frexp is also clearly what we were looking for. Does this have any effect on speed?

I had assumed if anything it might be more efficient, but maybe not. Here's a quick check:

julia> @benchmark frexp(x) setup=(x=100 * rand())
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.063 ns (0.00% GC)
  median time:      2.084 ns (0.00% GC)
  mean time:        2.085 ns (0.00% GC)
  maximum time:     6.112 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark (significand(x), exponent(x)) setup=(x=100 * rand())
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.613 ns (0.00% GC)
  median time:      1.633 ns (0.00% GC)
  mean time:        1.640 ns (0.00% GC)
  maximum time:     10.330 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

May not be a real effect since this is sub-nanosecond.

I am lost in all the noise on minor details, but checking sig > floatmax(typeof(sig)) / 2 is now the wrong thing, as it will overflow towards zero.

I had missed this, but @devmotion caught it too. Very surprising it would be so different.

I'm also wondering... Was this wrong in the first place? A factor of 2 should work if we assume it's sequential. But some implementations might exploit the associativity. In this case I think we need sqrt, or better, to look at exponent(sig). But with that we end up in the renormalization branch more often, so we're only about twice as fast as sum(log, x)

logprod should really have a case which tries log(prod()) first on small enough arrays, as this is much faster. (And it should advertise itself as being less prone to overflow than log(prod, rather than as being faster.)

Both great points.

Co-authored-by: David Widmann <[email protected]>

oscardssmith · 2022-07-21T13:08:19Z

do we want to merge this?

tpapp · 2022-07-21T15:50:38Z

I lost track of the various changes, but I am fine with merging. We can always micro-optimize things later; this is already more efficient than sum(log, x).

cscherrer · 2022-07-21T16:09:00Z

Same. I expected this would be relatively simple, and was surprised by the number of cases to worry about. I agree merging and then handling various cases as they come up is reasonable. It looks like the tests need some update though, they're currently failing because they use sumlog (now undefined) instead of the updated name logprod.

devmotion · 2022-07-21T21:10:27Z

I don't remember the details either. It seems there are unaddressed comments and the tests don't pass yet, so I guess the PR needs some updates and another round of review before merging.

tpapp · 2022-11-23T11:53:27Z

Friendly ping: this is a great contribution and it would be unfortunate to leave it dormant. @cscherrer, if you have the time to address the May 10 review comments by @devmotion and fix CI, I would be happy to merge.

sumlog

581b9df

devmotion reviewed May 2, 2022

View reviewed changes

cscherrer and others added 7 commits May 2, 2022 10:40

Update src/sumlog.jl

5725aa9

Co-authored-by: David Widmann <[email protected]>

Update src/sumlog.jl

77aa3d9

Co-authored-by: David Widmann <[email protected]>

Update src/sumlog.jl

88d6fb1

Co-authored-by: David Widmann <[email protected]>

Update src/sumlog.jl

9ecf589

Co-authored-by: David Widmann <[email protected]>

Update src/sumlog.jl

9db732f

Co-authored-by: David Widmann <[email protected]>

fall-back method

76533e1

more tests

5747205

bump version

0f5a927

cast to floating point when possible

977723d

docstring fixes

4d488cd

cscherrer added 2 commits May 2, 2022 12:20

performance fix

afa5d94

inline _sumlog

e400483

devmotion reviewed May 2, 2022

View reviewed changes

src/sumlog.jl Outdated Show resolved Hide resolved

src/sumlog.jl Outdated Show resolved Hide resolved

src/sumlog.jl Outdated Show resolved Hide resolved

src/sumlog.jl Outdated Show resolved Hide resolved

cscherrer and others added 4 commits May 3, 2022 07:57

qualify IrrationalConstants.logtwo

cc1aaac

update comment

07809b7

Update src/sumlog.jl

16ee153

Co-authored-by: David Widmann <[email protected]>

bugfix

1af518b

Make it work (and be fast) for Tuples and NamedTuples

0eaf8d2

devmotion reviewed May 3, 2022

View reviewed changes

add sumlog to docs

1f478d0

change to logprod

0def97d

fix sign bit

3ad95b2

devmotion reviewed May 10, 2022

View reviewed changes

cscherrer and others added 4 commits May 10, 2022 07:08

Update docs/src/index.md

a0a9348

Co-authored-by: David Widmann <[email protected]>

Update src/LogExpFunctions.jl

fa667ec

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

989a111

Co-authored-by: David Widmann <[email protected]>

Update src/LogExpFunctions.jl

a54a024

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

dc48433

Co-authored-by: David Widmann <[email protected]>

cscherrer and others added 9 commits May 10, 2022 10:26

cleaning up

39ca989

Merge branch 'master' of https://github.com/cscherrer/LogExpFunctions.jl

207fce2

Update src/logprod.jl

55d125e

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

bef4728

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

9572e48

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

3848848

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

c4c3e89

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

e0f410e

Co-authored-by: David Widmann <[email protected]>

Update src/logprod.jl

23b5bf1

Co-authored-by: David Widmann <[email protected]>

		@@ -0,0 +1,7 @@
		@testset "sumlog" begin
		for T in [Int, Float16, Float32, Float64, BigFloat]

	sig, ex = _logprod_op((sig, ex), (significand(xj), _exponent(xj)))
	sig, ex = _logabsprod_op((sig, ex), frexp(xj))

	return log(sig) + IrrationalConstants.logtwo * oftype(sig, ex) + nonfloat
	return (log(abs(sig)) + IrrationalConstants.logtwo * oftype(sig, ex), sign(sig))

sumlog #48

Are you sure you want to change the base?

sumlog #48

Uh oh!

Conversation

cscherrer commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devmotion left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cscherrer commented May 2, 2022

Uh oh!

devmotion commented May 2, 2022

Uh oh!

cscherrer commented May 2, 2022

Uh oh!

cscherrer commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devmotion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cscherrer commented May 3, 2022

Uh oh!

cscherrer commented May 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cscherrer commented May 3, 2022

Uh oh!

devmotion commented May 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devmotion commented May 9, 2022

Uh oh!

cscherrer commented May 9, 2022

Uh oh!

cscherrer commented May 9, 2022

Uh oh!

devmotion commented May 9, 2022

Uh oh!

tpapp commented May 10, 2022

Uh oh!

cscherrer commented May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

cscherrer commented May 2, 2022 •

edited

Loading

devmotion left a comment •

edited

Loading

cscherrer commented May 2, 2022 •

edited

Loading

cscherrer commented May 3, 2022 •

edited

Loading

cscherrer commented May 10, 2022 •

edited

Loading

mcabbott commented May 10, 2022 •

edited

Loading