-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize sieving #64
Conversation
7f6fab1
to
b5a6566
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 99.37% 99.44% +0.07%
==========================================
Files 9 10 +1
Lines 1280 1449 +169
==========================================
+ Hits 1272 1441 +169
Misses 8 8 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code lgtm.
A couple of thoughts though:
-
Shouldn't searching for safe primes be implemented as a custom sieve? The extra
bool
argument seems a bit like a left-over from the current API. Like, if someone came and asked for a "twin primes" sieve, we'd point them to implementing a "TwinPrimeSieve" (or whatever), so we should perhaps do the same for safe primes. -
I'm a bit hesitant about how this PR assumes that users will want to create a new sieve when a candidate is not prime. Is that always the best strategy? Maybe? Is it always safe to do so? Probably? This objection is pretty much the same point I made on Add support for custom Sieves. #61 when asking if the current (admittedly minimal) API is not enough.
-
The
make_sieve
method taking the exhausted sieve. I assume this is to allow custom impls to inspect the state of the old sieve before making a new one? Perhaps that's worth documenting.
src/hazmat/sieve.rs
Outdated
/// If `safe_primes` is `true`, additionally filters out such `n` that `(n - 1) / 2` are divisible | ||
/// by any of the small factors tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If `safe_primes` is `true`, additionally filters out such `n` that `(n - 1) / 2` are divisible | |
/// by any of the small factors tested. | |
/// If `safe_primes` is `true`, filters out `n` such that `(n - 1) / 2` are divisible | |
/// by any of the small factors tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "additionally" is justified here, since the original filtering of "n" is still in effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but there's something wrong with the grammar around "additionally" (the reason I suggested we remove it was I couldn't quite figure out what was wrong and suggest a correction).
How about "If safe_primes
is true
, extend the filter to skip n
s such that (n - 1) / 2
are divisible…"?
|
||
/// Sieves through the results of `sieve_factory` and returns the first item for which `predicate` is `true`. | ||
/// | ||
/// If `sieve_factory` signals that no more results can be created, returns `None`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this be the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in another comment, for some algorithms you would only want to create one sieve (e.g. #65). After that goes through the whole range, there is nothing else to create.
It could be, but they'd share 90% of the (somewhat complicated) code.
Could you elaborate? The way I see it the new sieve is created when the previous one is exhausted (for the default one, if it reached the increment limit). It does not depend on the predicate.
Yes, but also to distinguish the cases when a new sieve is created from the cases when another sieve replaces the previously exhausted one. For example, the algorithm in #65 would only ever want to create one sieve. |
The context here is generalized sieving, with emphasis on "generalized". My worry is not that the code in this PR is bad or doesn't address the request from #61. It's more about not being sure what the API should look like for custom sieves. What is the correct abstraction that accommodates most reasonable sieves? I don't know enough about the topic to definitely tell but I do know there are plenty of sieving strategies. Do they all fit this mold? In other words: are we ready to commit to an API for generalized sieving? The code proposed here suggests that the sieving flow for all sieves is:
My worry is that the above flow is fine for some strategies but sub-optimal for others. |
Sub-optimal in what sense, not allowing to do some optimizations, or not covering some use cases? In any case, we could have asked ourselves the same questions at the v0.5 release. It is possible that someone else will file an issue mentioning a scenario we haven't anticipated, and we'll have to expand the API. I don't see a big problem with it. And it's not like we even decide here what should be available and what's not: #61 can already be solved with existing |
4f090ae
to
0061609
Compare
@dvdplm |
We are thinking about how exactly we should expose it in the API. Out of curiosity, what is your use case? So far we only know @lleoha's one |
Thank you for your reply!
I'm interested in the following issue and it is related to #61. Thus I'm interested in this pull request's progress. I hope your project will be success. |
@mepi262 I'll make sure to go over the API one more time. As stated above, my concern here is not about the code per se but more of a desire to be sure we commit to the right API. |
I spent some time today working with this code. Here are a few comments and observations:
If I want my Sieve to keep a reference to the CSPRNG in itself, then I run into trouble. The Consider: pub struct PrimeincSieve<'a, T, R: CryptoRngCore> {
lower_bound: T,
upper_bound: T,
start: Odd<T>,
rng: &'a mut R,
} For the above to work, the pub trait SieveFactory<'a, T, R: CryptoRngCore> {
/// The resulting sieve.
type Sieve: Iterator<Item = T>;
/// Makes a sieve given an RNG and the previous exhausted sieve (if any).
///
/// Returning `None` signals that the prime generation should stop.
fn make_sieve(&mut self, rng: &'a mut R, previous_sieve: Option<&Self::Sieve>) -> Option<Self::Sieve>;
} …but that doesn't work, because now Maybe it's a terrible idea to want to have the CSPRNG as part of the sieve implementation, but are we sure it is a terrible idea for ALL possible sieve implementations? Sure, users that want to do this can just skip implementing
It's reasonable for a sieve iterator to yield e.g.
As an experiment, I implemented the standard PRIMEINC algorithm ("pick a random odd number, check if prime, else increase by 2 and try again"). Here's what a test using the proposed API in this PR looks like: #[test_log::test]
fn primeinc() {
let mut rng = ChaCha8Rng::from_seed(*b"01234567890123456789012345678901");
let upper = U64::from_be_hex("0000000000000fff");
let lower = U64::from_be_hex("00000000000000ff");
let sieve = PrimeincSieve::new(lower, upper, &mut rng);
let r = sieve_and_find(&mut rng, sieve, |rng, candidate| {
debug!("candidate={candidate:?}");
is_prime_with_rng(rng, candidate)
});
info!("Found prime. r={r:?}");
} …and this is what it would look like without the new API ( #[test_log::test]
fn primeinc_no_frills() {
let mut rng = ChaCha8Rng::from_seed(*b"01234567890123456789012345678901");
let upper = U64::from_be_hex("0000000000000fff");
let lower = U64::from_be_hex("00000000000000ff");
let sieve = PrimeincSieve::new(lower, upper, &mut rng);
let mut prime = None;
for candidate in sieve.into_iter() {
debug!("candidate={candidate:?}");
if is_prime_with_rng(&mut rng, candidate.as_ref()) {
prime = Some(candidate);
break;
}
}
info!("Found prime. prime={prime:?}");
} They are very similar and I'm not convinced that the new API actually makes users' lives better. Maybe I should try with a more complex sieve, but right now I am skeptical. |
I think I'd prefer leaving it to the factory/predicate. Most RNGs are
This is the right choice imo.
Unsure tbh. By ref I think? |
|
Fixed, now it's just |
As for the item 1, I wouldn't really want to add the |
961e6ad
to
83fbfdf
Compare
This PR is a little controversial, and we may continue tinkering with the API. But I think the generalization itself justifies the merge, even if we don't publicly expose the |
Fixes #61
Introduces a
SieveFactory
trait and a genericsieve_and_find()
andpar_sieve_and_find()
functions allowing one to combine an arbitrary sieve factory and an arbitrary predicate (likeis_prime_with_rng()
).I changed the multicore tests to be more randomized (no predefined RNG seed) to do a more reliable comparison. Perhaps it should be done for all the tests?
Why we need a factory: we need to be able to produce a new sieve if the previous one is exhausted.
Why does
sieve_and_find()
return anOption
: with a custom sieve and predicate it's not guaranteed that the loop will ever stop. The user should make sure of that by returningNone
from theirSieveFactory::make_sieve()
impl.@lleoha's usecase is really covered by a custom predicate and the default sieve. Although a custom sieve wrapper may be needed to cover corner cases.
Design choices to be made:
SieveFactory
andsieve_and_find()
, or leave it to the factory implementations and predicates. The latter requires an additionalClone
bound on the RNG since it has to be both kept in the factory and used in the predicate. The former (currently implemented) allows us to avoid theClone
requirement for the single-threaded version.make_sieve()
to take&mut self
(to modify the factory)? Currently that's the case.sieve_and_find()
? Currently it's taken by value (which makesmake_sieve(&mut self)
less valuable - only further calls tomake_sieve()
will be able to use the mutated state).