Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Request - Generating values within a range #12

Open
JeremyLWright opened this issue Aug 19, 2015 · 3 comments
Open

Support Request - Generating values within a range #12

JeremyLWright opened this issue Aug 19, 2015 · 3 comments

Comments

@JeremyLWright
Copy link
Collaborator

Good Morning John,

I'm using autocheck, with great success, and I now want to generate values within a range.

I am testing some date functions and I want to generate years between [1970, 2015] inclusively. I tried the ac::fix() combinator, but that just sets an upper limit on the generated value.

I see how I can use discard_if() to throw away values that don't match, but that feels so wasteful. Additionally, I want to generate days, months and years so I'd throw away a lot of data before I generated a valid tuple of values.

How would you recommend generating a range, and if it doesn't exist in autocheck today, what guidance could you provide to how I should add this feature?

Thank you for your time.

Sincerely,
Jeremy Wright

@thejohnfreeman
Copy link
Owner

There's a section in the original SmallCheck paper covering these situations. The general method is to define a custom generator for a new "type" representing the search space based off a bijection with an existing type. In this case you want to generate integers within a finite range. We could imagine it as an algebraic data type with a large number of zero arity constructors. The generator would just iterate from one end of the range to the other. For large ranges and few tests, though, it will give just about the worst possible coverage.

I think a good option is to treat the range as a large list and index into it with the signed integral generator (0 returns the low boundary, -1 returns the high) modulo the size of the range. That way you test the edge cases early but quickly spread out across the range.

@JeremyLWright
Copy link
Collaborator Author

Hello John,

I tried elements of your suggestion, but I didn't understand the -1 returns high end of the range. Here's what I have:

autocheck::catch_reporter rep;
    auto ms = boost::irange(0, 60);
    auto m = boost::irange(1, 13);
    auto h = boost::irange(0, 24);
    auto y = boost::irange(1970, 2020);
    auto d = boost::irange(1, 32);
    const std::vector<int> minute_second_range(ms.begin(), ms.end());
    const std::vector<int> month_range(m.begin(), m.end());
    const std::vector<int> hour_range(h.begin(), h.end());
    const std::vector<int> years_range(y.begin(), y.end());
    const std::vector<int> day_range(d.begin(), d.end());

    autocheck::check<
            std::uint8_t,
            std::uint8_t,
            std::uint8_t,
            std::uint8_t,
            std::uint8_t,
            std::uint8_t>(
        [&minute_second_range,
        &hour_range,
        &month_range,
        &years_range,
        &day_range](
            std::uint8_t& year_idx,
            std::uint8_t& month_idx,
            std::uint8_t& day_idx,
            std::uint8_t& hour_idx,
            std::uint8_t& minute_idx,
            std::uint8_t& second_idx)
    {
        //Using a modulus here to map into the desired range, results in a non-uniform distribution.
        //The non-uniformity can be calculated by: 
        // auto range_coverage = range.size()/(2.0^(sizeof(uint8_t)*8))
        // auto range_overlap_percentage = range_coverage - floor(range_coverage)
        // auto range_of_higher_probability = range_overlap_percentage * range.size()
        const auto year{years_range[year_idx % years_range.size()]}; //years [1970 - 1976] more likely.
        const auto month{month_range[month_idx % month_range.size()]}; //months [Jan - April] more likely.
        const auto day{day_range[day_idx % day_range.size()]}; //days [1 - 8] more likely.
        const auto hour{hour_range[hour_idx % hour_range.size()]}; //hours [0-15] more likely.
        const auto minute{minute_second_range[minute_idx % minute_second_range.size()]}; //minutes [0-15] more likely.
        const auto second{minute_second_range[second_idx % minute_second_range.size()]}; //seconds [0-15] more likely.

                // Use the year, month, day ...
        return REQUIRE(true);
    },
        1000,
        autocheck::make_arbitrary<std::uint8_t,std::uint8_t,std::uint8_t,std::uint8_t,std::uint8_t,std::uint8_t>(),
        rep);

Can you recommend any improvements?

@thejohnfreeman
Copy link
Owner

The non-uniformity is inherent to the SmallCheck approach. SmallCheck uses a depth parameter (which is named size in autocheck; should maybe change it) to define a subset of values for a given type, then tests every value in each subset up to a maximum depth, trying to find the "smallest" failing example. The subsets are not guaranteed to be non-overlapping, and often are overlapping, so values common to many subsets will appear many times. autocheck does not test every value for a given depth (which I'm not opposed to changing, by the way), but will still exhibit the same phenomenon. I'm going to put you on code review for a range generator.

thejohnfreeman added a commit that referenced this issue Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants