Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more summation functions #16004

Closed
wants to merge 11 commits into from
Closed

Conversation

lbartoletti
Copy link
Contributor

I don't know if all the code has to be integrated in nim or if I have to make a dedicated library, nevertheless as there is already a library in std I allow myself to propose some new functions.

AFAIK, Julia uses pairwise summation and Python, Shewchuk robust summation [1]. I propose to add a fsum func as python does.
BTW, there are the classical twoSum functions and kahan variant/improvement.

Tiny example

import sequtils
import std/sums

let tab = @[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

echo "foldl (naive): ", foldl(tab, a + b)
# foldl: 0.9999999999999999
echo "pairs: ", sumPairs(tab)
# pairs: 0.9999999999999999
echo "kbn: ", sumKbn(tab)
# kbn: 1.0
echo "fsum: ", fsum(tab)
# fsum: 1.0
var part = shewchuckSum_add(tab)
echo "part: ", part
# part @[5.551115123125783e-17, 1.0]
echo sumPairs(part)
# 1.0
echo sumKbn(part)
# 1.0

If you consider that these functions deserve to be integrated, I will improve the PR with tests, comments and cleanup as usual.

[1] https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Support_by_libraries

lib/std/sums.nim Outdated Show resolved Hide resolved
lib/std/sums.nim Outdated Show resolved Hide resolved
@planetis-m
Copy link
Contributor

planetis-m commented Nov 16, 2020

Can we follow this naming convention please? Summation functions start with the sum prefix. Makes it easier to auto-complete.

Also should we choose long names or abbreviations? Maybe per case.
I understand sumKbn is hard to understand what it is, I counted twice that it was confused. But maybe better docs is the answer? (Btw they are fixed in devel see: https://nim-lang.github.io/Nim/sums.html)

Actually i will argue that surnames should be abbreviated, because they're hard to remember.

@planetis-m
Copy link
Contributor

Btw noob question, is the fastTwoSum, twoSum, shewchuckSum_add and shewchuckSum_total really useful for users? Because it seems to me they should be private functions.

@planetis-m
Copy link
Contributor

Overall great additions, much appreciated!

lib/std/sums.nim Outdated
@@ -53,6 +53,136 @@ func sumPairs*[T](x: openArray[T]): T =
let n = len(x)
if n == 0: T(0) else: sumPairwise(x, 0, n)

func fastTwoSum*[T : SomeFloat](a, b: T): (T, T) =
Copy link
Collaborator

@juancarlospaco juancarlospaco Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer naming the function using the Algorithm name,
because this fastThing() will not scale in the long run,
whats next fastestThing(), soFastBroThing(), gottaGoFastThing().
🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's the name of the algorithm. At least widely used 😄
For reference https://hal.archives-ouvertes.fr/hal-00164607/document ($2.2)

lib/std/sums.nim Outdated Show resolved Hide resolved
@lbartoletti
Copy link
Contributor Author

Can we follow this naming convention please? Summation functions start with the sum prefix. Makes it easier to auto-complete.

Sure! Will be fixed

Also should we choose long names or abbreviations? Maybe per case.
I understand sumKbn is hard to understand what it is, I counted twice that it was confused. But maybe better docs is the answer? (Btw they are fixed in devel see: https://nim-lang.github.io/Nim/sums.html)

Actually i will argue that surnames should be abbreviated, because they're hard to remember.

I prefer to use the name of the algorithm as it is used, but I follow the naming convention of the project :)

Co-authored-by: Juan Carlos <[email protected]>
lib/std/sums.nim Outdated Show resolved Hide resolved
lib/std/sums.nim Outdated Show resolved Hide resolved
lib/std/sums.nim Outdated Show resolved Hide resolved
lib/std/sums.nim Outdated Show resolved Hide resolved
lib/std/sums.nim Outdated Show resolved Hide resolved
@planetis-m
Copy link
Contributor

Minor nitpick you start the iteration from 0 like the wikipedia article. However it's more algorithmically correct to do var s = a[0]; for i in 1 ..< a.len

Last one doesn't matter I can clean it up, after it is merged.

@planetis-m
Copy link
Contributor

Looks great!

@lbartoletti
Copy link
Contributor Author

Should fsum be in math?

@planetis-m
Copy link
Contributor

Should fsum be in math?

Ask 4raq

## http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
sumShewchuck_total(sumShewchuck_add(x))

func fsum*[T: SomeFloat](x: openArray[T]): T =
Copy link
Member

@timotheecour timotheecour Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • inline
  • but do we really need an alias? instead, why not use idx: so it makes it searchable in docsearch

setLen(result, i + 1)
result[i] = x

func sumShewchuck_total[T: SomeFloat](partials: openArray[T]): T =
Copy link
Member

@timotheecour timotheecour Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sumShewchuck_total
sumShewchuckTotal

ditto elsewhere

@@ -53,13 +53,111 @@ func sumPairs*[T](x: openArray[T]): T =
let n = len(x)
if n == 0: T(0) else: sumPairwise(x, 0, n)

func fastTwoSum*[T: SomeFloat](a, b: T): (T, T) =
## Deker's algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add wikipedia link

ditto elsewhere

var x = x
var i = 0
for y in result:
let sum = twoSum(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let (hi,lo) = toSum(x,y)

lo = sum[1]
if lo != 0.0:
break
if (n > 0 and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if n > 0 and (lo < 0.0 and partials[n - 1] < 0.0) or
(lo > 0.0 and partials[n - 1] > 0.0))

(lo > 0.0 and partials[n - 1] > 0.0)
)
):
y = lo * 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lo+lo ?

result[0] = a + b
result[1] = b - (result[0] - a)

func twoSum*[T](a, b: T): (T, T) =
Copy link
Member

@timotheecour timotheecour Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • {.inline.}
  • ditto fastTwoSum

result[1] = (a - (result[0] - z)) + (b - z)

func sum2*[T: SomeFloat](v: openArray[T]): T =
## sum an array v using twoSum function
Copy link
Member

@timotheecour timotheecour Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## sum `v` using `twoSum`

this enables future docgen improvements to auto-generate clickable links

let z = result[0] - a
result[1] = (a - (result[0] - z)) + (b - z)

func sum2*[T: SomeFloat](v: openArray[T]): T =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sumToSum ?


func sumShewchuck_total[T: SomeFloat](partials: openArray[T]): T =
var hi = 0.0
if len(partials) > 0:
Copy link
Member

@timotheecour timotheecour Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmon, simplify this.

var n = partials.len
if n > 0:
  ...

y = lo * 2.0
x = hi + y
var yr = x - hi
if y == yr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify this!

if y == x - hi:

ditto elsewhere in your code

assert sumPairs(tc1) == 7.485470860550345

var tc2: seq[float]
for n in 1 .. 1000:
tc2.add pow(-1.0, n.float) / n.float
assert sumKbn(tc2) == -0.6926474305598203
assert sumShewchuck(tc2) == -0.6926474305598203
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doAssert in all tests

Copy link
Contributor

@planetis-m planetis-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Every float literal (mostly 0.0) needs to be written like T(0) to avoid type conversions from float to float32

hi = x
result = hi

func sumShewchuck*[T: SomeFloat](x: openArray[T]): T =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this API prevents using it in an online manner. see nim-lang/RFCs#288

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 this will require rewriting every algorithm. Out of scope of this PR

return s + e

func sumShewchuck_add[T: SomeFloat](v: openArray[T]): seq[T] =
for x in v:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was concerned about the fact this allocates and could be inefficient but in my limited tests, 2 was the max size for result.len.
maybe add a comment to explain this

@c-blake
Copy link
Contributor

c-blake commented Nov 23, 2020

I think if we are going to add robust sums to the stdlib then there are always going to be various methods with different accuracy-speed-memory-onlineness tradeoffs. You should call the proc robustSum or sumRobust or accurateSum or something and it should take an enum type SumMethod that names the algorithm which could be smKahan or smShewchuk or whatever. This will be less confusing if algorithms proliferate, people want to use some online variant, etc.

Personally, I think that this should be a separate nimble package with weak initial backward compatibility standards until the API settles down. Then you could iterate a few times on the API ideas mentioned above, get user feedback and maybe have a new stdlib or fusion robustsums module. Julia is for numerical methods, but that is still a bit off the beaten path/a special case for most Nim work that I have seen. Once you have something solid, if you want to make something in the stdlib depend upon it, like stats.RunningStat, then that might be a strong argument for pulling it into the stdlib.

@Araq
Copy link
Member

Araq commented Nov 23, 2020

Rejected, please create a Nimble package for it.

@lbartoletti
Copy link
Contributor Author

Rejected, please create a Nimble package for it.

Done. https://gitlab.com/lbartoletti/accuratesums and added in packages.

@planetis-m
Copy link
Contributor

Is std/sums going to be deprecated? Also @lbartoletti you could have addressed the PR comments, most are valid.

@lbartoletti
Copy link
Contributor Author

could have addressed the PR comments, most are valid.

@planetis-m I think I have addressed all, at least a lot of, comments in my packages. If isn't the case, I'll be glad to update and improve this package.
Thanks.

@timotheecour
Copy link
Member

timotheecour commented Nov 28, 2020

IMO:

  • std/sums should be deprecated, it's too specialized for stdlib
  • fusion/sums would be perfectly in scope
  • fusion desperately needs a proper way to allow staging (related: RFC: adopt semver, like almost all nimble packages fusion#30), to avoid the current situation where an API needs to mature before being stabilized. Right now everything reaching fusion is automatically assumed frozen-must-be-backward-compatible, which prevents fixing API design issues (eg online vs openArray for some API's, but there are more of course). Even stdlib allows some form of staging, breaking changes are allowed in devel branch if it hasn't yet hit stable.

fusion/sums would have more quality control / peer review feedback than an individually owned nimble package.

std/experimental was IMO a good idea (but was only used for lib/experimental/diff.nim) and allows 0-change migration for clients using the usual import/export or include trick in case it wants to migrate to std/diff once stabilized.

we should IMO do the same for fusion:
modules can mature in fusion/experimental, breaking changes are allowed there, and client code that decides to use those (directly or via transitive closure) should not complain about those.

@Araq
Copy link
Member

Araq commented Nov 28, 2020

I still think "experimental" is a bad idea. I could sympathize with fusion/v1/sums vs fusion/v2/sums but that it's not a proven design either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants