Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor co-primes algorithm to be generic & avoid allocation. #36

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

saeta
Copy link
Owner

@saeta saeta commented May 23, 2020

Thank you @dabrahams for the suggestions!

@saeta saeta requested a review from dabrahams May 23, 2020 23:19
@saeta
Copy link
Owner Author

saeta commented May 23, 2020

This is a follow-up to #30

@saeta
Copy link
Owner Author

saeta commented May 23, 2020

This relates to umbrella issue: #33

Copy link
Collaborator

@dabrahams dabrahams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some work to do on this one…

}

/// A sequence of numbers that are co-prime with `n`, up to `n`.
public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a deterministic multi-pass sequence, so it should conform to Collection.

Suggested change
public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {
public struct LesserPositiveCoprimes<Number: BinaryInteger>: Sequence {

I realize this is getting a little unwieldy. That said, I hadn't got it through my thick brain before that there are numbers coprime with and greater than N, so we should consider whether we can make the name more accurate. PositiveValuesCoprimeAndLess

Surely we can come up with better names than Number and n?

Strawmen:

s/Number/Domain/ ?
s/n/limit/ ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather instead of making it conform to collection instead make it a sequence. Concretely, we know there's infinitely many primes, and there's no reason why we should stop at the number itself, so I propose that this actually be an (infinite) sequence instead.

I also dislike the collection protocol here because the size of the collection is unknown until it's been computed, which reduces the value of it being a constant storage abstraction.

I like Domain, but I dislike limit, and think n (or perhaps b) is more appropriate appropriate.

Edit: I'm fine with target (as you suggested), and have made the corresponding edits, but I'm still not fully convinced this is the right choice...

Copy link
Collaborator

@dabrahams dabrahams May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it was infinite, Collection is the right abstraction. Please see “But sequences can be infinite” at https://forums.swift.org/t/pitch-make-collection-super-convenient-and-retire-sequence/12103

The fundamental difference between Collection and Sequence is single vs multipass, and all that implies in terms of algorithms: the ability to subdivide it, to binary search, etc.

I don't know what you're saying about the count and the value of it being a constant storage abstraction. There are many examples of collections whose count is an O(n) computation. I lost the battle to make count be a method because of this long ago.

The one argument I would consider worthwhile against Collection conformance is that the cost of traversal from one element to the next is probably not O(1). But even that is IMO not a good enough reason to avoid collection conformance in Swift; what you do is note in docs that it doesn't provide expected performance, as lazy filter collection does. In general, it's more valuable to have the functionality available even if it's going to be slow.

I'm not at all in love with target either. It's weird that it plays two roles: both the limit and a coprime of the all the elements, which makes it hard to name. n or b would make sense for function parameter names, or if that was a standard name used for a coprime in math, like you use m and b for a linear equation y=mx+b. Otherwise, a stored property has a persistent meaning that is not available from the doc comment near its use site and is thus usually worth naming more explicitly.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the first post in https://forums.swift.org/t/pitch-make-collection-super-convenient-and-retire-sequence/12103:

A true stream would require significant additional storage or computation to
support multiple passes over its elements. [...]
The initial state of the element generation procedure, which is modified as new elements are generated, is very costly to construct and to store, such as in the Mersenne Twister 4 pseudo-random number generator.
[...]

I would argue that this type falls into roughly the same category. This algorithm seems to me to fall into the same category. Concretely: while it's possible to store previous values (a-la Mersenne Twister) to facilitate indexing, or do an increasing amount of re-computation, these are both unsatisfactory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to store previous values. The index just needs to store the same thing as the Iterator does. https://gist.github.com/dabrahams/6119b718aab45d8ef59e141eb37c711e

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, by switching to Collection, I've gotten rid of all mutable stored properties, and also got rid of the need to create a nested Iterator or Index type (by re-using Domain). While it demands a bit more code, and I'm pretty sure operations like prefix are probably somewhat inefficient, this does seem like an interesting design. It also makes certain functionality pretty easy, such as finding the next positive coprime greater than any arbitrary number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operations like prefix are probably somewhat inefficient

You could always define a lazy SubSequence type to get around that.

It also makes certain functionality pretty easy, such as finding the next positive coprime greater than any arbitrary number.

Exactly. That's why, “if it can conform, it should conform.”

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could always define a lazy SubSequence type to get around that.

I wondered about that, but the implementation of prefix(_:) doesn't appear to admit an implementation where the subsequence is lazy, as it relies upon directly computing the end index from the call to index(_:offsetBy:limitedBy:) which can't be lazy, and IIUC, there's no .lazy.prefix(n) (although there is a LazyPrefixWhileSequence... haven't played in enough contexts to fully grok what would happen in all situations).

Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
@dabrahams
Copy link
Collaborator

I wondered whether computing GCD for every value < N was the best way and landed on this: https://math.stackexchange.com/questions/621109/co-prime-numbers-less-than-n

I didn't analyze it, but something to think about… someday. ;-)

@saeta saeta requested a review from dabrahams May 27, 2020 21:02
@saeta
Copy link
Owner Author

saeta commented Jun 7, 2020

Switched from Sequence to Collection. It's an interesting point in design space. As alluded to, I'm somewhat skeptical that we're getting appropriately efficient operations by default. (e.g. Sequence.prefix avoids redundant computations (at the cost of memory consumption), whereas (as written for now), Collection.prefix doesn't. Probably something to refine further over time, such as defining a SubSequence instead of accepting the Collection's default, but not sure if that's enough; haven't thought enough about it yet.)

I'm sure some of the doc comments could be improved... feedback welcome!

@saeta
Copy link
Owner Author

saeta commented Jun 9, 2020

@dabrahams aside from our discussion on Sequence vs Collection, do you have further changes you'd like to see?

var positiveSelf = self
if positiveSelf < 0 {
positiveSelf *= -1 // Workaround for lack of `Swift.abs` on `BinaryInteger`.
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird definition of "smaller" that apparently(?) from reading the code implies absolute value.
Why not just have the array be empty when self is <= 0?
Why not return PositiveCoprimes.SubSequence?
How can this possibly even compile? .prefix(…) is supposed to return a SubSequence, i.e. slice.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have the array be empty when self is <= 0?

Yeah, that's reasonable. Fixed.

How can this possibly even compile? .prefix(…) is supposed to return a SubSequence, i.e. slice.

As it turns out, IIUC it's calling Sequence.prefix(while:)

As for why not return PositiveCoprimes.SubSequence, the concern I have is that this will cause us to traverse the collection at least twice: once to find the endIndex, and then a second time to access the elements. Of course, we could make this be a LazyPrefixWhileSequence<SubSequence>, but this forces closure allocation.

The final option would be to do something quite fancy with the SubSequence type. My preference is to avoid getting that fancy without more experience to understand what operations we'd like to support efficiently (as we only get one SubSequence type).

Copy link
Owner Author

@saeta saeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the review!

var positiveSelf = self
if positiveSelf < 0 {
positiveSelf *= -1 // Workaround for lack of `Swift.abs` on `BinaryInteger`.
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have the array be empty when self is <= 0?

Yeah, that's reasonable. Fixed.

How can this possibly even compile? .prefix(…) is supposed to return a SubSequence, i.e. slice.

As it turns out, IIUC it's calling Sequence.prefix(while:)

As for why not return PositiveCoprimes.SubSequence, the concern I have is that this will cause us to traverse the collection at least twice: once to find the endIndex, and then a second time to access the elements. Of course, we could make this be a LazyPrefixWhileSequence<SubSequence>, but this forces closure allocation.

The final option would be to do something quite fancy with the SubSequence type. My preference is to avoid getting that fancy without more experience to understand what operations we'd like to support efficiently (as we only get one SubSequence type).

Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
@saeta
Copy link
Owner Author

saeta commented Jun 19, 2020

Whoops, looks like your additional comments raced with me fixing & pushing the fixes! PTAL?

@saeta saeta requested a review from dabrahams June 20, 2020 21:22
@dabrahams
Copy link
Collaborator

Of course, we could make this be a LazyPrefixWhileSequence, but this forces closure allocation.

Forces allocation? Only if there's a capture and only if you store the thing somewhere such that the closure can't be inlined.

Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
/// Computes the next index after `index`.
public func index(after index: Index) -> Index {
var nextCandidate = index
while true {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this without a raw loop; use a range and index(where:)

Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved
Sources/PenguinStructures/NumberOperations.swift Outdated Show resolved Hide resolved

/// `Int.max`, as there are infinitely many prime numbers, and thus infinitely many coprimes
/// to a given target.
public var count: Int {
Copy link
Owner Author

@saeta saeta Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: let the default implementation handle this, or do some fancy tricks to actually compute what the numbers of values are, given Domain. (Note: this is tricky, due to not being restricted to BinaryInteger and not FixedWidthInteger.)

import PenguinStructures
import XCTest

final class NumberOperationsTests: XCTestCase {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add collection tests.

@texasmichelle texasmichelle changed the base branch from master to main December 8, 2020 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants