Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance degradation of Poco::DateTimeParser #4592

Open
tyler92 opened this issue Jun 22, 2024 · 5 comments
Open

Significant performance degradation of Poco::DateTimeParser #4592

tyler92 opened this issue Jun 22, 2024 · 5 comments

Comments

@tyler92
Copy link
Contributor

tyler92 commented Jun 22, 2024

Consider the following code:

#include <Poco/DateTimeFormat.h>
#include <Poco/DateTimeFormatter.h>
#include <Poco/DateTimeParser.h>
#include <stdexcept>

std::string toString(const Poco::DateTime& dt)
{
    return Poco::DateTimeFormatter::format(dt, Poco::DateTimeFormat::ISO8601_FRAC_FORMAT);
}

Poco::DateTime parsePoco(const std::string& text)
{
    int tzd = 0;
    const auto format = Poco::DateTimeFormat::ISO8601_FRAC_FORMAT;
    auto parsed = Poco::DateTimeParser::parse(format, text, tzd);
    parsed.makeUTC(tzd);
    return parsed;
}

void ASSERT_EQ(const std::string& a, const std::string& b)
{
    if (a != b)
    {
        throw std::runtime_error("Test FAILED");
    }
}

int main()
{
    for (int i = 0; i < 100'000; ++i)
    {
        auto parsed = parsePoco("2024-06-21T09:43:11.340159Z");
        ASSERT_EQ(toString(parsed), "2024-06-21T09:43:11.340159Z");

        parsed = parsePoco("2024-06-21T09:43:11.340159+07:00");
        ASSERT_EQ(toString(parsed), "2024-06-21T02:43:11.340159Z");
    }

    return 0;
}

Execution time (Release configuration):

Poco 1.12.5:    96 ms
Poco 1.13.3: 17179 ms

With the format Poco::DateTimeFormat::ISO8601_FRAC_FORMAT + " " (with an additional space) execution time becomes better for 1.13.3.

I guess the reason is the following commit: 4f1cf68 (#4330).

Up to 8 regex expressions are compiled on each call

for (const auto& f : REGEX_LIST)
{
if (RegularExpression(*f).match(dateTime)) return true;
}

if the provided format is known:

if (fmt.empty() || str.empty() || (DateTimeFormat::hasFormat(fmt) && !DateTimeFormat::isValid(str)))

@aleks-f
Copy link
Member

aleks-f commented Jun 23, 2024

Yeah, well - it's either that or we silently accept garbage, as we used to. If you have an improvement proposal, send a PR

@andrewauclair
Copy link
Contributor

bool DateTimeFormat::isValid(const std::string& dateTime)
{

        static const RegularExpression regs[] = {
                RegularExpression(DateTimeFormat::ISO8601_REGEX),
                RegularExpression(DateTimeFormat::RFC822_REGEX),
                RegularExpression(DateTimeFormat::RFC1123_REGEX),
                RegularExpression(DateTimeFormat::HTTP_REGEX),
                RegularExpression(DateTimeFormat::RFC850_REGEX),
                RegularExpression(DateTimeFormat::RFC1036_REGEX),
                RegularExpression(DateTimeFormat::ASCTIME_REGEX),
                RegularExpression(DateTimeFormat::SORTABLE_REGEX)
        };

        for (const auto& f : regs)
        {
                if (f.match(dateTime)) return true;
        }
        return false;
}

Creating the RegularExpressions once in isValid takes it from over 12s to .16s for me.

@tyler92
Copy link
Contributor Author

tyler92 commented Jun 23, 2024

Why are all regexes checked, and not just one? E.g. for the following code

const auto format = Poco::DateTimeFormat::ISO8601_FRAC_FORMAT;
auto parsed = Poco::DateTimeParser::parse(format, text, tzd);

I would expect that only ISO8601_REGEX will be checked.
And why is user input not checked against these regexes if DateTimeFormat::hasFormat(fmt) is false?

@tyler92
Copy link
Contributor Author

tyler92 commented Jun 23, 2024

About handling garbage input. Let's consider the following code:

case 'Y':
SKIP_JUNK();
PARSE_NUMBER_N(year, 4);

#define SKIP_JUNK() \
while (it != end && !Ascii::isDigit(*it)) ++it

#define PARSE_NUMBER_N(var, n) \
{ int i = 0; while (i++ < n && it != end && Ascii::isDigit(*it)) var = var*10 + ((*it++) - '0'); }

Points for discussion:

  1. SKIP_JUNK accepts garbage input
  2. PARSE_NUMBER_N doesn't report an error if the input doesn't contain a number at all

I hope the parser body can be improved so that it is more strict and regular expressions can be avoided.

@aleks-f
Copy link
Member

aleks-f commented Jun 24, 2024

@andrewauclair can you look at this issue and propose improvements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants