Generalize sieving #64

fjarri · 2024-11-10T23:20:19Z

Fixes #61

Introduces a SieveFactory trait and a generic sieve_and_find() and par_sieve_and_find() functions allowing one to combine an arbitrary sieve factory and an arbitrary predicate (like is_prime_with_rng()).

I changed the multicore tests to be more randomized (no predefined RNG seed) to do a more reliable comparison. Perhaps it should be done for all the tests?

Why we need a factory: we need to be able to produce a new sieve if the previous one is exhausted.
Why does sieve_and_find() return an Option: with a custom sieve and predicate it's not guaranteed that the loop will ever stop. The user should make sure of that by returning None from their SieveFactory::make_sieve() impl.

@lleoha's usecase is really covered by a custom predicate and the default sieve. Although a custom sieve wrapper may be needed to cover corner cases.

Design choices to be made:

Use an RNG parameter explicitly in SieveFactory and sieve_and_find(), or leave it to the factory implementations and predicates. The latter requires an additional Clone bound on the RNG since it has to be both kept in the factory and used in the predicate. The former (currently implemented) allows us to avoid the Clone requirement for the single-threaded version.
Allow make_sieve() to take &mut self (to modify the factory)? Currently that's the case.
Take the factory by reference or by value in sieve_and_find()? Currently it's taken by value (which makes make_sieve(&mut self) less valuable - only further calls to make_sieve() will be able to use the mutated state).

codecov · 2024-11-10T23:24:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.44%. Comparing base (4ea84ca) to head (83fbfdf).
Report is 5 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #64      +/-   ##
==========================================
+ Coverage   99.37%   99.44%   +0.07%     
==========================================
  Files           9       10       +1     
  Lines        1280     1449     +169     
==========================================
+ Hits         1272     1441     +169     
  Misses          8        8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dvdplm

The code lgtm.

A couple of thoughts though:

Shouldn't searching for safe primes be implemented as a custom sieve? The extra bool argument seems a bit like a left-over from the current API. Like, if someone came and asked for a "twin primes" sieve, we'd point them to implementing a "TwinPrimeSieve" (or whatever), so we should perhaps do the same for safe primes.
I'm a bit hesitant about how this PR assumes that users will want to create a new sieve when a candidate is not prime. Is that always the best strategy? Maybe? Is it always safe to do so? Probably? This objection is pretty much the same point I made on Add support for custom Sieves. #61 when asking if the current (admittedly minimal) API is not enough.
The make_sieve method taking the exhausted sieve. I assume this is to allow custom impls to inspect the state of the old sieve before making a new one? Perhaps that's worth documenting.

dvdplm · 2024-11-11T17:51:19Z

src/hazmat/sieve.rs

+    /// If `safe_primes` is `true`, additionally filters out such `n` that `(n - 1) / 2` are divisible
+    /// by any of the small factors tested.


Suggested change

/// If `safe_primes` is `true`, additionally filters out such `n` that `(n - 1) / 2` are divisible

/// by any of the small factors tested.

/// If `safe_primes` is `true`, filters out `n` such that `(n - 1) / 2` are divisible

/// by any of the small factors tested.

I think "additionally" is justified here, since the original filtering of "n" is still in effect.

Ok, but there's something wrong with the grammar around "additionally" (the reason I suggested we remove it was I couldn't quite figure out what was wrong and suggest a correction).
How about "If safe_primes is true, extend the filter to skip ns such that (n - 1) / 2 are divisible…"?

src/traits.rs

dvdplm · 2024-11-11T17:55:06Z

src/generic.rs

+
+/// Sieves through the results of `sieve_factory` and returns the first item for which `predicate` is `true`.
+///
+/// If `sieve_factory` signals that no more results can be created, returns `None`.


When would this be the case?

As I mentioned in another comment, for some algorithms you would only want to create one sieve (e.g. #65). After that goes through the whole range, there is nothing else to create.

fjarri · 2024-11-11T22:39:04Z

Shouldn't searching for safe primes be implemented as a custom sieve?

It could be, but they'd share 90% of the (somewhat complicated) code.

I'm a bit hesitant about how this PR assumes that users will want to create a new sieve when a candidate is not prime.

Could you elaborate? The way I see it the new sieve is created when the previous one is exhausted (for the default one, if it reached the increment limit). It does not depend on the predicate.

The make_sieve method taking the exhausted sieve. I assume this is to allow custom impls to inspect the state of the old sieve before making a new one? Perhaps that's worth documenting.

Yes, but also to distinguish the cases when a new sieve is created from the cases when another sieve replaces the previously exhausted one. For example, the algorithm in #65 would only ever want to create one sieve.

dvdplm · 2024-11-12T08:12:49Z

I'm a bit hesitant about how this PR assumes that users will want to create a new sieve when a candidate is not prime.

Could you elaborate? The way I see it the new sieve is created when the previous one is exhausted (for the default one, if it reached the increment limit). It does not depend on the predicate.

The context here is generalized sieving, with emphasis on "generalized". My worry is not that the code in this PR is bad or doesn't address the request from #61. It's more about not being sure what the API should look like for custom sieves. What is the correct abstraction that accommodates most reasonable sieves? I don't know enough about the topic to definitely tell but I do know there are plenty of sieving strategies. Do they all fit this mold?

In other words: are we ready to commit to an API for generalized sieving?

The code proposed here suggests that the sieving flow for all sieves is:

Run the sieve until a candidate is found
Test the candidate for primality (return if true)
Construct a new sieve and repeat

My worry is that the above flow is fine for some strategies but sub-optimal for others.

fjarri · 2024-11-12T18:51:05Z

Sub-optimal in what sense, not allowing to do some optimizations, or not covering some use cases?

In any case, we could have asked ourselves the same questions at the v0.5 release. It is possible that someone else will file an issue mentioning a scenario we haven't anticipated, and we'll have to expand the API. I don't see a big problem with it. And it's not like we even decide here what should be available and what's not: #61 can already be solved with existing hazmat API, we're just adding some convenience shortcuts.

mepi262 · 2024-12-05T03:39:42Z

@dvdplm
Why this pull request is not be merged?

fjarri · 2024-12-05T03:59:33Z

We are thinking about how exactly we should expose it in the API. Out of curiosity, what is your use case? So far we only know @lleoha's one

mepi262 · 2024-12-05T04:12:33Z

Thank you for your reply!
I can understand a current situation.

Out of curiosity, what is your use case?

I'm interested in the following issue and it is related to #61.
RustCrypto/RSA#454 (comment)

Thus I'm interested in this pull request's progress.

I hope your project will be success.

dvdplm · 2024-12-05T08:57:49Z

@mepi262 I'll make sure to go over the API one more time. As stated above, my concern here is not about the code per se but more of a desire to be sure we commit to the right API.

dvdplm · 2024-12-05T13:29:08Z

I spent some time today working with this code. Here are a few comments and observations:

Comitting to a trait restricts the design of possible sieves

If I want my Sieve to keep a reference to the CSPRNG in itself, then I run into trouble. The SieveFactory trait would need to change.

Consider:

pub struct PrimeincSieve<'a, T, R: CryptoRngCore> {
    lower_bound: T,
    upper_bound: T,
    start: Odd<T>,
    rng: &'a mut R,
}

For the above to work, the SieveFactory now has to be:

pub trait SieveFactory<'a, T, R: CryptoRngCore> {
    /// The resulting sieve.
    type Sieve: Iterator<Item = T>;

    /// Makes a sieve given an RNG and the previous exhausted sieve (if any).
    ///
    /// Returning `None` signals that the prime generation should stop.
    fn make_sieve(&mut self, rng: &'a mut R, previous_sieve: Option<&Self::Sieve>) -> Option<Self::Sieve>;
}

…but that doesn't work, because now sieve_and_find cannot call the predicate (sieve.find(|num| predicate(rng, num))), not without cloning the CSPRNG at least.

Maybe it's a terrible idea to want to have the CSPRNG as part of the sieve implementation, but are we sure it is a terrible idea for ALL possible sieve implementations? Sure, users that want to do this can just skip implementing SieveFactory and do their own thing, but for such users this work does not help (if anything they might waste time figuring out that it's inconvenient for them).

SieveFactory<T> expects sieves to yield Ts

It's reasonable for a sieve iterator to yield e.g. Odd<T> but the SieveFactory expects Ts. Perhaps this is a silly complaint and we should just change the trait to yield Odd<T> anyway, but my point is: custom sieves may want to have their own wrappers (e.g. BoundedOdd<T> or BlumInt<T> or MersenneNumber or…) and it's not obvious how to deal with that elegantly (we could have two generic arguments I guess, one for the Integer and one for the wrapper). This impacts the ergonomics of the API.

YAGNI?

As an experiment, I implemented the standard PRIMEINC algorithm ("pick a random odd number, check if prime, else increase by 2 and try again"). Here's what a test using the proposed API in this PR looks like:

    #[test_log::test]
    fn primeinc() {
        let mut rng = ChaCha8Rng::from_seed(*b"01234567890123456789012345678901");
        let upper = U64::from_be_hex("0000000000000fff");
        let lower = U64::from_be_hex("00000000000000ff");
        let sieve = PrimeincSieve::new(lower, upper, &mut rng);
        let r = sieve_and_find(&mut rng, sieve, |rng, candidate| {
            debug!("candidate={candidate:?}");
            is_prime_with_rng(rng, candidate)
        });
        info!("Found prime. r={r:?}");
    }

…and this is what it would look like without the new API (SieveFactory and sieve_and_find):

    #[test_log::test]
    fn primeinc_no_frills() {
        let mut rng = ChaCha8Rng::from_seed(*b"01234567890123456789012345678901");
        let upper = U64::from_be_hex("0000000000000fff");
        let lower = U64::from_be_hex("00000000000000ff");
        let sieve = PrimeincSieve::new(lower, upper, &mut rng);
        let mut prime = None;
        for candidate in sieve.into_iter() {
            debug!("candidate={candidate:?}");
            if is_prime_with_rng(&mut rng, candidate.as_ref()) {
                prime = Some(candidate);
                break;
            }
        }
        info!("Found prime. prime={prime:?}");
    }

They are very similar and I'm not convinced that the new API actually makes users' lives better. Maybe I should try with a more complex sieve, but right now I am skeptical.

dvdplm · 2024-12-05T13:31:23Z

@mepi262 The issue you refer to would not really be impacted by this PR I believe. The parallelized search for primes is already available and merged (#60), but it is not released yet. Can you help us understand better what it is you're trying to achieve?

dvdplm · 2024-12-05T13:36:53Z

Use an RNG parameter explicitly in SieveFactory and sieve_and_find(), or leave it to the factory implementations and predicates. The latter requires an additional Clone bound on the RNG since it has to be both kept in the factory and used in the predicate. The former (currently implemented) allows us to avoid the Clone requirement for the single-threaded version.

I think I'd prefer leaving it to the factory/predicate. Most RNGs are Clone (and reasonably cheap to clone).

Allow make_sieve() to take &mut self (to modify the factory)? Currently that's the case.

This is the right choice imo.

Take the factory by reference or by value in sieve_and_find()? Currently it's taken by value (which makes make_sieve(&mut self) less valuable - only further calls to make_sieve() will be able to use the mutated state).

Unsure tbh. By ref I think?

fjarri · 2024-12-07T23:23:25Z

…and this is what it would look like without the new API

sieve_and_find() also manages resetting the sieve when it's exhausted. And don't forget par_sieve_and_find() which allows you to easily switch your algorithm to parallel execution.

fjarri · 2024-12-07T23:29:39Z

SieveFactory expects sieves to yield Ts

Fixed, now it's just SieveFactory and an associated Item type.

fjarri · 2024-12-08T08:23:40Z

As for the item 1, I wouldn't really want to add the Clone bound for RNGs in the simple (single-threaded) case, which is probably something most people will use. It'll have to be propagated through whatever libraries use crypto-primes. I don't mind not covering some of the user scenarios, I'm sure we can't predict everything, if something comes up, we'll deal with it then.

fjarri · 2024-12-09T20:42:09Z

This PR is a little controversial, and we may continue tinkering with the API. But I think the generalization itself justifies the merge, even if we don't publicly expose the SieveFactory trait.

fjarri mentioned this pull request Nov 10, 2024

Add support for custom Sieves. #61

Closed

fjarri force-pushed the replaceable-sieve branch from 7f6fab1 to b5a6566 Compare November 10, 2024 23:21

fjarri requested a review from dvdplm November 10, 2024 23:22

fjarri marked this pull request as draft November 10, 2024 23:27

dvdplm reviewed Nov 11, 2024

View reviewed changes

Randomize benchmarks

d2c4a39

fjarri force-pushed the replaceable-sieve branch 2 times, most recently from 4f090ae to 0061609 Compare November 13, 2024 01:29

fjarri marked this pull request as ready for review November 13, 2024 01:29

fjarri self-assigned this Nov 15, 2024

fjarri added 3 commits December 9, 2024 12:36

Generalize sieving

0267b9d

Update CI

871e5a5

Rename the CI config

83fbfdf

fjarri force-pushed the replaceable-sieve branch from 961e6ad to 83fbfdf Compare December 9, 2024 20:36

fjarri merged commit cb600ef into entropyxyz:master Dec 9, 2024
10 checks passed

fjarri deleted the replaceable-sieve branch December 9, 2024 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize sieving #64

Generalize sieving #64

fjarri commented Nov 10, 2024 •

edited

Loading

codecov bot commented Nov 10, 2024 •

edited

Loading

dvdplm left a comment

dvdplm Nov 11, 2024

fjarri Nov 11, 2024

dvdplm Nov 12, 2024

dvdplm Nov 11, 2024

fjarri Nov 11, 2024

fjarri commented Nov 11, 2024

dvdplm commented Nov 12, 2024

fjarri commented Nov 12, 2024

mepi262 commented Dec 5, 2024

fjarri commented Dec 5, 2024

mepi262 commented Dec 5, 2024

dvdplm commented Dec 5, 2024

dvdplm commented Dec 5, 2024

dvdplm commented Dec 5, 2024

dvdplm commented Dec 5, 2024

fjarri commented Dec 7, 2024

fjarri commented Dec 7, 2024

fjarri commented Dec 8, 2024

fjarri commented Dec 9, 2024

		/// If `safe_primes` is `true`, additionally filters out such `n` that `(n - 1) / 2` are divisible
		/// by any of the small factors tested.

Generalize sieving #64

Generalize sieving #64

Conversation

fjarri commented Nov 10, 2024 • edited Loading

codecov bot commented Nov 10, 2024 • edited Loading

Codecov Report

dvdplm left a comment

Choose a reason for hiding this comment

dvdplm Nov 11, 2024

Choose a reason for hiding this comment

fjarri Nov 11, 2024

Choose a reason for hiding this comment

dvdplm Nov 12, 2024

Choose a reason for hiding this comment

dvdplm Nov 11, 2024

Choose a reason for hiding this comment

fjarri Nov 11, 2024

Choose a reason for hiding this comment

fjarri commented Nov 11, 2024

dvdplm commented Nov 12, 2024

fjarri commented Nov 12, 2024

mepi262 commented Dec 5, 2024

fjarri commented Dec 5, 2024

mepi262 commented Dec 5, 2024

dvdplm commented Dec 5, 2024

dvdplm commented Dec 5, 2024

dvdplm commented Dec 5, 2024

dvdplm commented Dec 5, 2024

fjarri commented Dec 7, 2024

fjarri commented Dec 7, 2024

fjarri commented Dec 8, 2024

fjarri commented Dec 9, 2024

fjarri commented Nov 10, 2024 •

edited

Loading

codecov bot commented Nov 10, 2024 •

edited

Loading