Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Days algorithms #9

Open
Roman-Koshelev opened this issue Jun 5, 2021 · 7 comments
Open

Days algorithms #9

Roman-Koshelev opened this issue Jun 5, 2021 · 7 comments

Comments

@Roman-Koshelev
Copy link

Roman-Koshelev commented Jun 5, 2021

Found your algorithms days_from_civil and civil_from_days (http://howardhinnant.github.io/date_algorithms.html). l saw their use in the hffix (https://github.com/jamesdbrock/hffix) and Microsoft STL (https://github.com/microsoft/STL) libraries. Based on the description on the site, it seems that this is the fastest of the existing implementations, but this is not the case. The gmtime implementation in musl libc (https://github.com/ifduyue/musl) is 40% more efficient. I think it is worth writing that there are more efficient implementations, and that it is worth using them.

@HowardHinnant
Copy link
Owner

Can you point more specifically to the algorithm? I found https://github.com/ifduyue/musl/blob/cfdfd5ea3ce14c6abf7fb22a531f3d99518b5a1b/src/time/__secs_to_tm.c, but I have doubts that is what you're referring to.

@Roman-Koshelev
Copy link
Author

it's him.

@HowardHinnant
Copy link
Owner

You sure? I haven't timed it, but I see iteration in __secs_to_tm.

The next thing I'm trying to find is a claim that the algorithms on http://howardhinnant.github.io/date_algorithms.html are the fastest of the existing implementations. At http://howardhinnant.github.io/date_algorithms.html I see "fast", but not "fastest".

Also check out my stars page: https://github.com/HowardHinnant?tab=stars Take a look at anything by cassioneri. I'm pretty sure I have __secs_to_tm beat. And I know the work at cassioneri has mine beat. He does sacrifice some range, but nothing that the C++20 chrono spec doesn't allow. And he's gotten his implementation into the gcc C++20 implementation: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/chrono#L2448-L2513 It's really nice work.

@HowardHinnant
Copy link
Owner

I got around to doing some comparative timing between https://github.com/ifduyue/musl/blob/cfdfd5ea3ce14c6abf7fb22a531f3d99518b5a1b/src/time/__secs_to_tm.c and civil_from_days.

Problem: These two algorithms do different things.

  • __secs_to_tm computes year, month, day, weekday, day-of-year [0, 365], hour, minute and second.
  • civil_from_days computes year, month and day.

So to do a fair comparison, I did it two ways:

  1. I wrote the equivalent of __secs_to_tm using date.h which uses civil_from_days under the hood.
  2. I stripped out the compuation of weekday, day-of-year, hour, minute and second from both function in the first comparison and re-compared.

Here is the first comparison:

#include <time.h>
#include <limits.h>

/* 2000-03-01 (mod 400 year, immediately after feb29 */
#define LEAPOCH (946684800LL + 86400*(31+29))

#define DAYS_PER_400Y (365*400 + 97)
#define DAYS_PER_100Y (365*100 + 24)
#define DAYS_PER_4Y   (365*4   + 1)

int __secs_to_tm(long long t, volatile tm *tm)
{
    long long days, secs, years;
    int remdays, remsecs, remyears;
    int qc_cycles, c_cycles, q_cycles;
    int months;
    int wday, yday, leap;
    static const char days_in_month[] = {31,30,31,30,31,31,30,31,30,31,31,29};

    // Reject time_t values whose year would overflow int 
    if (t < INT_MIN * 31622400LL || t > INT_MAX * 31622400LL)
        return -1;

    secs = t - LEAPOCH;
    days = secs / 86400;
    remsecs = secs % 86400;
    if (remsecs < 0) {
        remsecs += 86400;
        days--;
    }

    wday = (3+days)%7;
    if (wday < 0) wday += 7;

    qc_cycles = days / DAYS_PER_400Y;
    remdays = days % DAYS_PER_400Y;
    if (remdays < 0) {
        remdays += DAYS_PER_400Y;
        qc_cycles--;
    }

    c_cycles = remdays / DAYS_PER_100Y;
    if (c_cycles == 4) c_cycles--;
    remdays -= c_cycles * DAYS_PER_100Y;

    q_cycles = remdays / DAYS_PER_4Y;
    if (q_cycles == 25) q_cycles--;
    remdays -= q_cycles * DAYS_PER_4Y;

    remyears = remdays / 365;
    if (remyears == 4) remyears--;
    remdays -= remyears * 365;

    leap = !remyears && (q_cycles || !c_cycles);
    yday = remdays + 31 + 28 + leap;
    if (yday >= 365+leap) yday -= 365+leap;

    years = remyears + 4*q_cycles + 100*c_cycles + 400LL*qc_cycles;

    for (months=0; days_in_month[months] <= remdays; months++)
        remdays -= days_in_month[months];

    if (months >= 10) {
        months -= 12;
        years++;
    }

    if (years+100 > INT_MAX || years+100 < INT_MIN)
        return -1;

    tm->tm_year = years + 100;
    tm->tm_mon = months + 2;
    tm->tm_mday = remdays + 1;
    tm->tm_wday = wday;
    tm->tm_yday = yday;

    tm->tm_hour = remsecs / 3600;
    tm->tm_min = remsecs / 60 % 60;
    tm->tm_sec = remsecs % 60;

    return 0;
}

#include "date/date.h"

int
secs_to_tm(long long t, volatile tm *tm)
{
    using namespace date;
    using namespace std::chrono;

    constexpr sys_seconds min = sys_days{year::min()/1/1};
    constexpr sys_seconds max = sys_days{year::max()/12/31};
    sys_seconds tp{seconds{t}};
    if (tp < min || tp > max)
        return -1;
    auto tp_days = floor<days>(tp);
    year_month_day ymd = tp_days;

    tm->tm_year = int{ymd.year()} - 1900;
    tm->tm_mon = unsigned{ymd.month()} - 1;
    tm->tm_mday = unsigned{ymd.day()};
    tm->tm_wday = weekday{tp_days}.c_encoding();
    tm->tm_yday = (tp_days - sys_days{ymd.year()/1/1})/days{1};

    hh_mm_ss tod{tp - tp_days};
    tm->tm_hour = tod.hours().count();
    tm->tm_min = tod.minutes().count();
    tm->tm_sec = tod.seconds().count();;

    return 0;
}

#include <cassert>
#include <iostream>

bool
operator==(struct tm& x, struct tm& y)
{
    return x.tm_year == y.tm_year &&
           x.tm_mon == y.tm_mon &&
           x.tm_mday == y.tm_mday &&
           x.tm_wday == y.tm_wday &&
           x.tm_yday == y.tm_yday &&
           x.tm_hour == y.tm_hour &&
           x.tm_min == y.tm_min &&
           x.tm_sec == y.tm_sec;
}

int
main()
{
    using namespace date;
    using namespace std::chrono;
    auto constexpr m = (sys_days{year::min()/1/1} - sys_seconds{})/1s;
    auto constexpr M = (sys_days{year::max()/12/31} - sys_seconds{})/1s;
    int i = 0;
    int count = 0;
    auto tp0 = steady_clock::now();
    for (auto t = m; t <= M; t += 10'000)
    {
        struct tm tm1{};
        struct tm tm2{};
        assert(__secs_to_tm(t, &tm1) == secs_to_tm(t, &tm2));
        assert(tm1 == tm2);
        ++count;
    }
    auto tp1 = steady_clock::now();
    volatile tm tm1{};
    for (auto t = m; t <= M; t += 10'000)
    {
        i += __secs_to_tm(t, &tm1);
    }
    auto tp2 = steady_clock::now();
    volatile tm tm2{};
    for (auto t = m; t <= M; t += 10'000)
    {
        i += secs_to_tm(t, &tm2);
    }
    auto tp3 = steady_clock::now();
    using ps = duration<int64_t, std::pico>;
    std::cout << (tp1-tp0)/count << '\n';
    std::cout << (tp2-tp1)/count << '\n';
    std::cout << (tp3-tp2)/count << '\n';
    return i;
}

compiled with clang++ -03 on a 2013, 2.4 GHz Intel Core i5, MacBook Pro.

The test first warms everything up and ensures that both functions are getting identical results. Results are spot checked every 10,000s between -32767-01-01 and 32767-12-31. Then the test repeats for timing purposes, looping over each function. Care is used to ensure that the optimizer doesn't eliminate the subject being tested. Results are printed out in terms of nanoseconds per iteration.

My results for this test are:

59ns
28ns
29ns

Meaning the correctness test took 59ns, __secs_to_tm took 28ns and the date.h version took 29ns. __secs_to_tm performed much better than I thought it would!

Here's what got commented out for part 2: Just compute year, month and day:

#include <time.h>
#include <limits.h>

/* 2000-03-01 (mod 400 year, immediately after feb29 */
#define LEAPOCH (946684800LL + 86400*(31+29))

#define DAYS_PER_400Y (365*400 + 97)
#define DAYS_PER_100Y (365*100 + 24)
#define DAYS_PER_4Y   (365*4   + 1)

int __secs_to_tm(long long t, volatile tm *tm)
{
    long long days, secs, years;
    int remdays, remsecs, remyears;
    int qc_cycles, c_cycles, q_cycles;
    int months;
    int wday, yday, leap;
    static const char days_in_month[] = {31,30,31,30,31,31,30,31,30,31,31,29};

    // Reject time_t values whose year would overflow int 
    if (t < INT_MIN * 31622400LL || t > INT_MAX * 31622400LL)
        return -1;

    secs = t - LEAPOCH;
    days = secs / 86400;
    remsecs = secs % 86400;
    if (remsecs < 0) {
//         remsecs += 86400;
        days--;
    }

//     wday = (3+days)%7;
//     if (wday < 0) wday += 7;

    qc_cycles = days / DAYS_PER_400Y;
    remdays = days % DAYS_PER_400Y;
    if (remdays < 0) {
        remdays += DAYS_PER_400Y;
        qc_cycles--;
    }

    c_cycles = remdays / DAYS_PER_100Y;
    if (c_cycles == 4) c_cycles--;
    remdays -= c_cycles * DAYS_PER_100Y;

    q_cycles = remdays / DAYS_PER_4Y;
    if (q_cycles == 25) q_cycles--;
    remdays -= q_cycles * DAYS_PER_4Y;

    remyears = remdays / 365;
    if (remyears == 4) remyears--;
    remdays -= remyears * 365;

//     leap = !remyears && (q_cycles || !c_cycles);
//     yday = remdays + 31 + 28 + leap;
//     if (yday >= 365+leap) yday -= 365+leap;

    years = remyears + 4*q_cycles + 100*c_cycles + 400LL*qc_cycles;

    for (months=0; days_in_month[months] <= remdays; months++)
        remdays -= days_in_month[months];

    if (months >= 10) {
        months -= 12;
        years++;
    }

    if (years+100 > INT_MAX || years+100 < INT_MIN)
        return -1;

    tm->tm_year = years + 100;
    tm->tm_mon = months + 2;
    tm->tm_mday = remdays + 1;
//     tm->tm_wday = wday;
//     tm->tm_yday = yday;
// 
//     tm->tm_hour = remsecs / 3600;
//     tm->tm_min = remsecs / 60 % 60;
//     tm->tm_sec = remsecs % 60;

    return 0;
}

#include "date/date.h"

int
secs_to_tm(long long t, volatile tm *tm)
{
    using namespace date;
    using namespace std::chrono;

    constexpr sys_seconds min = sys_days{year::min()/1/1};
    constexpr sys_seconds max = sys_days{year::max()/12/31};
    sys_seconds tp{seconds{t}};
    if (tp < min || tp > max)
        return -1;
    auto tp_days = floor<days>(tp);
    year_month_day ymd = tp_days;

    tm->tm_year = int{ymd.year()} - 1900;
    tm->tm_mon = unsigned{ymd.month()} - 1;
    tm->tm_mday = unsigned{ymd.day()};
//     tm->tm_wday = weekday{tp_days}.c_encoding();
//     tm->tm_yday = (tp_days - sys_days{ymd.year()/1/1})/days{1};
// 
//     hh_mm_ss tod{tp - tp_days};
//     tm->tm_hour = tod.hours().count();
//     tm->tm_min = tod.minutes().count();
//     tm->tm_sec = tod.seconds().count();;

    return 0;
}

And now the results are:

41ns
21ns
16ns

@yangbowen
Copy link

I suppose that the most major bottleneck could be division, since division is significantly slower than addition, subtraction, multiplication.
The musl __secs_to_tm uses a lookup table instead of doing integer arithmetic to find out the month.

Here's some of my thoughts:

  1. Tuning the constants so that the divisor is power-of-2 may be beneficial, since the division can be optimized to a bitshift. I used an Excel spreadsheet to search for candidates and found the following:
// days from month
(1958 * (m > 2 ? m - 3 : m + 9) + 35) / 64;
// month from days
(535 * doy + 333) / 16384;

image

  1. __secs_to_tm uses linear search in the lookup table, perhaps a binary search can be used instead?
{
    static const int days_month_end[] = {31,61,92,122,153,184,214,245,275,306,337,366};
    int month_lbound = 0, month_rbound = 11;
    while (month_lbound < month_rbound) {
        month_middle = (month_lbound + month_rbound) / 2;
        int days_middle = days_month_accumulate[month_middle];
        if (remdays >= days_middle) {
            month_lbound = month_middle + 1;
        } else {
            month_rbound = month_middle;
        }
    }
    month = month_lbound;
    remdays -= days_in_month[month];
}

@jbrower2
Copy link

yeah I confirmed @yangbowen's formulae, and improved the second one slightly. here are what i believe are the most optimized versions:

(535 * doy + 332) >> 14
(979 * mp + 16) >> 5

@HowardHinnant
Copy link
Owner

See this repository for some impressive work in this area: https://github.com/cassioneri/eaf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants