Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTimeParser Validation and Performance Improvements #4593

Merged
merged 5 commits into from
Jun 29, 2024

Conversation

andrewauclair
Copy link
Contributor

Improvements for #4592.

Improve the validation of DateTimeParser by validating the numbers parsed for day, month, year, hour, minute, second, millis, and micros. The value 0 will be used if no digits were found for micros.

These changes create new instances of SyntaxException thrown by DateTimeParser::parse and DateTimeParser::parseTZD when the previously listed fields do not contain the required number of digits or the string of digits is not a valid number.

Improve performance by caching the RegularExpressions created in DateTimeFormat::isValid.

Improve the validation of DateTimeParser by validating the numbers parsed for day, month, year, hour, minute, second, millis and micros.

Improve performance by caching the RegularExpressions created in DateTimeFormat::isValid.
Additional change to validate milliseconds field if the . or , exists.
Foundation/src/DateTimeParser.cpp Fixed Show fixed Hide fixed
Foundation/src/DateTimeParser.cpp Fixed Show fixed Hide fixed
Foundation/src/DateTimeParser.cpp Fixed Show fixed Hide fixed
RegularExpression(DateTimeFormat::SORTABLE_REGEX)
};
// make sure the regex list and the array of regexes are in sync
poco_assert((sizeof(regs) / sizeof(regs[0])) == REGEX_LIST.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REGEX_LIST is used only in this place. I propose to remove it as a class member and have it as a static member in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it where it is because it's part of the public API of this class, unfortunately.

Wasn't sure if this would be 1.13.x or 1.14 and didn't want to make a breaking change just in case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewauclair 1.14, you can break it


namespace Poco {
[[nodiscard]] parse_iter skip_non_digits(parse_iter it, parse_iter end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall these helper functions be inlined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. inline is used in headers to avoid ODR issues. Free functions in cpp files should be either static or in an anonymous namespace to avoid IFNDR issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewauclair please don't introduce new naming conventions, we have our own

@matejk matejk added this to the Release 1.14.0 milestone Jun 26, 2024

namespace Poco {
[[nodiscard]] parse_iter skip_non_digits(parse_iter it, parse_iter end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewauclair please don't introduce new naming conventions, we have our own

REGEX_LIST was only used in isValid, which now uses a cached list of RegularExpressions instead.
@matejk
Copy link
Contributor

matejk commented Jun 27, 2024

@andrewauclair Do you have any performance measurements with these changes compared to 1.13?

@andrewauclair
Copy link
Contributor Author

@andrewauclair Do you have any performance measurements with these changes compared to 1.13?

Came up with the following when running in WSL with the code from issue #4592:

1.12.5: 0m0.063s
1.13.3: 0m11.991s (almost 200x slower?)
this pr: 0m0.185s (3x slower, likely the regex match)

Switching to a format string that doesn't match one of the expected ones, to avoid regex matches (const auto format = Poco::DateTimeFormat::ISO8601_FRAC_FORMAT + " "):

1.12.5: 0m0.063s
1.13.3: 0m0.065s
this pr: 0m0.085s

@matejk matejk merged commit a82b766 into pocoproject:main Jun 29, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants