Add more summation functions #16004

lbartoletti · 2020-11-16T20:43:45Z

I don't know if all the code has to be integrated in nim or if I have to make a dedicated library, nevertheless as there is already a library in std I allow myself to propose some new functions.

AFAIK, Julia uses pairwise summation and Python, Shewchuk robust summation [1]. I propose to add a fsum func as python does.
BTW, there are the classical twoSum functions and kahan variant/improvement.

Tiny example

import sequtils
import std/sums

let tab = @[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

echo "foldl (naive): ", foldl(tab, a + b)
# foldl: 0.9999999999999999
echo "pairs: ", sumPairs(tab)
# pairs: 0.9999999999999999
echo "kbn: ", sumKbn(tab)
# kbn: 1.0
echo "fsum: ", fsum(tab)
# fsum: 1.0
var part = shewchuckSum_add(tab)
echo "part: ", part
# part @[5.551115123125783e-17, 1.0]
echo sumPairs(part)
# 1.0
echo sumKbn(part)
# 1.0

If you consider that these functions deserve to be integrated, I will improve the PR with tests, comments and cleanup as usual.

[1] https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Support_by_libraries

lib/std/sums.nim

planetis-m · 2020-11-16T20:57:59Z

Can we follow this naming convention please? Summation functions start with the sum prefix. Makes it easier to auto-complete.

Also should we choose long names or abbreviations? Maybe per case.
I understand sumKbn is hard to understand what it is, I counted twice that it was confused. But maybe better docs is the answer? (Btw they are fixed in devel see: https://nim-lang.github.io/Nim/sums.html)

Actually i will argue that surnames should be abbreviated, because they're hard to remember.

planetis-m · 2020-11-16T21:00:45Z

Btw noob question, is the fastTwoSum, twoSum, shewchuckSum_add and shewchuckSum_total really useful for users? Because it seems to me they should be private functions.

planetis-m · 2020-11-16T21:06:08Z

Overall great additions, much appreciated!

juancarlospaco · 2020-11-16T21:49:52Z

lib/std/sums.nim

@@ -53,6 +53,136 @@ func sumPairs*[T](x: openArray[T]): T =
  let n = len(x)
  if n == 0: T(0) else: sumPairwise(x, 0, n)

+func fastTwoSum*[T : SomeFloat](a, b: T): (T, T) =


I would prefer naming the function using the Algorithm name,
because this fastThing() will not scale in the long run,
whats next fastestThing(), soFastBroThing(), gottaGoFastThing().
🙂

Unfortunately, it's the name of the algorithm. At least widely used 😄
For reference https://hal.archives-ouvertes.fr/hal-00164607/document ($2.2)

lib/std/sums.nim

lbartoletti · 2020-11-18T13:28:20Z

Can we follow this naming convention please? Summation functions start with the sum prefix. Makes it easier to auto-complete.

Sure! Will be fixed

Also should we choose long names or abbreviations? Maybe per case.
I understand sumKbn is hard to understand what it is, I counted twice that it was confused. But maybe better docs is the answer? (Btw they are fixed in devel see: https://nim-lang.github.io/Nim/sums.html)

Actually i will argue that surnames should be abbreviated, because they're hard to remember.

I prefer to use the name of the algorithm as it is used, but I follow the naming convention of the project :)

Co-authored-by: Juan Carlos <[email protected]>

lib/std/sums.nim

planetis-m · 2020-11-18T20:22:06Z

Minor nitpick you start the iteration from 0 like the wikipedia article. However it's more algorithmically correct to do var s = a[0]; for i in 1 ..< a.len

Last one doesn't matter I can clean it up, after it is merged.

planetis-m · 2020-11-19T19:39:12Z

Looks great!

lbartoletti · 2020-11-20T05:44:17Z

Should fsum be in math?

planetis-m · 2020-11-20T09:58:24Z

Should fsum be in math?

Ask 4raq

timotheecour · 2020-11-21T21:40:22Z

lib/std/sums.nim

+  ## http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
+  sumShewchuck_total(sumShewchuck_add(x))
+
+func fsum*[T: SomeFloat](x: openArray[T]): T =


inline

but do we really need an alias? instead, why not use idx: so it makes it searchable in docsearch

timotheecour · 2020-11-21T21:42:42Z

lib/std/sums.nim

+    setLen(result, i + 1)
+    result[i] = x
+
+func sumShewchuck_total[T: SomeFloat](partials: openArray[T]): T =


sumShewchuck_total
sumShewchuckTotal

ditto elsewhere

timotheecour · 2020-11-21T21:44:54Z

lib/std/sums.nim

@@ -53,13 +53,111 @@ func sumPairs*[T](x: openArray[T]): T =
  let n = len(x)
  if n == 0: T(0) else: sumPairwise(x, 0, n)

+func fastTwoSum*[T: SomeFloat](a, b: T): (T, T) =
+  ## Deker's algorithm


add wikipedia link

ditto elsewhere

timotheecour · 2020-11-21T21:50:33Z

lib/std/sums.nim

+    var x = x
+    var i = 0
+    for y in result:
+      let sum = twoSum(x, y)


let (hi,lo) = toSum(x,y)

timotheecour · 2020-11-21T22:02:40Z

lib/std/sums.nim

+      lo = sum[1]
+      if lo != 0.0:
+        break
+      if (n > 0 and


if n > 0 and (lo < 0.0 and partials[n - 1] < 0.0) or
(lo > 0.0 and partials[n - 1] > 0.0))

timotheecour · 2020-11-21T22:03:31Z

lib/std/sums.nim

+            (lo > 0.0 and partials[n - 1] > 0.0)
+          )
+        ):
+        y = lo * 2.0


timotheecour · 2020-11-21T22:04:26Z

lib/std/sums.nim

+  result[0] = a + b
+  result[1] = b - (result[0] - a)
+
+func twoSum*[T](a, b: T): (T, T) =


{.inline.}

ditto fastTwoSum

timotheecour · 2020-11-21T22:06:06Z

lib/std/sums.nim

+  result[1] = (a - (result[0] - z)) + (b - z)
+
+func sum2*[T: SomeFloat](v: openArray[T]): T =
+  ## sum an array v using twoSum function


## sum `v` using `twoSum`

this enables future docgen improvements to auto-generate clickable links

lib/std/sums.nim

timotheecour · 2020-11-21T22:10:39Z

lib/std/sums.nim

+  let z = result[0] - a
+  result[1] = (a - (result[0] - z)) + (b - z)
+
+func sum2*[T: SomeFloat](v: openArray[T]): T =


timotheecour · 2020-11-21T22:15:11Z

lib/std/sums.nim

+
+func sumShewchuck_total[T: SomeFloat](partials: openArray[T]): T =
+  var hi = 0.0
+  if len(partials) > 0:


cmon, simplify this.

var n = partials.len if n > 0: ...

timotheecour · 2020-11-21T22:16:54Z

lib/std/sums.nim

+        y = lo * 2.0
+        x = hi + y
+        var yr = x - hi
+        if y == yr:


simplify this!

if y == x - hi:

ditto elsewhere in your code

timotheecour · 2020-11-21T22:20:44Z

lib/std/sums.nim

  assert sumPairs(tc1) == 7.485470860550345

  var tc2: seq[float]
  for n in 1 .. 1000:
    tc2.add pow(-1.0, n.float) / n.float
  assert sumKbn(tc2) == -0.6926474305598203
+  assert sumShewchuck(tc2) == -0.6926474305598203


doAssert in all tests

planetis-m

Every float literal (mostly 0.0) needs to be written like T(0) to avoid type conversions from float to float32

timotheecour · 2020-11-21T23:04:42Z

lib/std/sums.nim

+          hi = x
+  result = hi
+
+func sumShewchuck*[T: SomeFloat](x: openArray[T]): T =


this API prevents using it in an online manner. see nim-lang/RFCs#288

-1 this will require rewriting every algorithm. Out of scope of this PR

timotheecour · 2020-11-21T23:20:25Z

lib/std/sums.nim

+  return s + e
+
+func sumShewchuck_add[T: SomeFloat](v: openArray[T]): seq[T] =
+  for x in v:


i was concerned about the fact this allocates and could be inefficient but in my limited tests, 2 was the max size for result.len.
maybe add a comment to explain this

c-blake · 2020-11-23T12:41:31Z

I think if we are going to add robust sums to the stdlib then there are always going to be various methods with different accuracy-speed-memory-onlineness tradeoffs. You should call the proc robustSum or sumRobust or accurateSum or something and it should take an enum type SumMethod that names the algorithm which could be smKahan or smShewchuk or whatever. This will be less confusing if algorithms proliferate, people want to use some online variant, etc.

Personally, I think that this should be a separate nimble package with weak initial backward compatibility standards until the API settles down. Then you could iterate a few times on the API ideas mentioned above, get user feedback and maybe have a new stdlib or fusion robustsums module. Julia is for numerical methods, but that is still a bit off the beaten path/a special case for most Nim work that I have seen. Once you have something solid, if you want to make something in the stdlib depend upon it, like stats.RunningStat, then that might be a strong argument for pulling it into the stdlib.

Araq · 2020-11-23T19:21:47Z

Rejected, please create a Nimble package for it.

lbartoletti · 2020-11-27T06:30:58Z

Rejected, please create a Nimble package for it.

Done. https://gitlab.com/lbartoletti/accuratesums and added in packages.

planetis-m · 2020-11-27T20:04:12Z

Is std/sums going to be deprecated? Also @lbartoletti you could have addressed the PR comments, most are valid.

lbartoletti · 2020-11-28T04:48:51Z

could have addressed the PR comments, most are valid.

@planetis-m I think I have addressed all, at least a lot of, comments in my packages. If isn't the case, I'll be glad to update and improve this package.
Thanks.

timotheecour · 2020-11-28T05:02:49Z

IMO:

std/sums should be deprecated, it's too specialized for stdlib
fusion/sums would be perfectly in scope
fusion desperately needs a proper way to allow staging (related: RFC: adopt semver, like almost all nimble packages fusion#30), to avoid the current situation where an API needs to mature before being stabilized. Right now everything reaching fusion is automatically assumed frozen-must-be-backward-compatible, which prevents fixing API design issues (eg online vs openArray for some API's, but there are more of course). Even stdlib allows some form of staging, breaking changes are allowed in devel branch if it hasn't yet hit stable.

fusion/sums would have more quality control / peer review feedback than an individually owned nimble package.

std/experimental was IMO a good idea (but was only used for lib/experimental/diff.nim) and allows 0-change migration for clients using the usual import/export or include trick in case it wants to migrate to std/diff once stabilized.

we should IMO do the same for fusion:
modules can mature in fusion/experimental, breaking changes are allowed there, and client code that decides to use those (directly or via transitive closure) should not complain about those.

Araq · 2020-11-28T07:55:07Z

I still think "experimental" is a bad idea. I could sympathize with fusion/v1/sums vs fusion/v2/sums but that it's not a proven design either.

Add more summation functions

c8b3ec9

planetis-m reviewed Nov 16, 2020

View reviewed changes

lib/std/sums.nim Outdated Show resolved Hide resolved

lib/std/sums.nim Outdated Show resolved Hide resolved

juancarlospaco reviewed Nov 16, 2020

View reviewed changes