Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup parsing of datetime fieldtypes initialization by string #87

Merged
merged 4 commits into from
Oct 16, 2023

Conversation

yunzheng
Copy link
Member

This change mainly removes the use of expensive regexes and exception handling, improving the speed significantly.

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #87 (9b268b2) into main (f0a2608) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   79.22%   79.27%   +0.04%     
==========================================
  Files          32       32              
  Lines        2932     2939       +7     
==========================================
+ Hits         2323     2330       +7     
  Misses        609      609              
Flag Coverage Δ
unittests 79.27% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
flow/record/fieldtypes/__init__.py 91.53% <100.00%> (+0.13%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@yunzheng
Copy link
Member Author

yunzheng commented Oct 11, 2023

Benchmark script

Benchmarked using the following script.

#!/bin/sh -x

# default datetime str without any TZ data
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37")'

# default isoformat(), this is how it's serialized in msgpack for tz aware datetime objects
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37+03:00")'

# RFC3339Nano, 2006-01-02T15:04:05.999999999, used by Docker
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37.123456789999999")'
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37.123456789999999-02:00")'
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37.123456789999999Z")'

# other variants, but less common
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37Z")'
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37.123456-02:00")'
python3 -m timeit -n 100000  "from flow.record.fieldtypes import datetime" 'datetime("2023-09-04T13:33:37.123456789+03:00")'

Benchmark results

Python 3.8

# old new
0 3.48 usec per loop 3.5 usec per loop
1 15.9 usec per loop 2.51 usec per loop
2 8.79 usec per loop 4.35 usec per loop
3 31.5 usec per loop 3.5 usec per loop
4 27.6 usec per loop 3.2 usec per loop
5 13.4 usec per loop 2.57 usec per loop
6 18.3 usec per loop 3.47 usec per loop
7 32.8 usec per loop 3.29 usec per loop

Python 3.9

# old new
0 3.6 usec per loop 3.53 usec per loop
1 16.6 usec per loop 2.55 usec per loop
2 9.07 usec per loop 4.33 usec per loop
3 32.1 usec per loop 3.46 usec per loop
4 28.2 usec per loop 3.36 usec per loop
5 13.7 usec per loop 2.63 usec per loop
6 18.4 usec per loop 3.48 usec per loop
7 32.0 usec per loop 3.18 usec per loop

Python 3.10

# old new
0 3.11 usec per loop 3.14 usec per loop
1 13.2 usec per loop 2.29 usec per loop
2 7.14 usec per loop 4.05 usec per loop
3 24.6 usec per loop 3.61 usec per loop
4 21.7 usec per loop 3.05 usec per loop
5 9.99 usec per loop 2.55 usec per loop
6 13.8 usec per loop 3.29 usec per loop
7 24.5 usec per loop 2.99 usec per loop

Python 3.11

note, Python3.11 was already fast pathed previously.

# old new
0 2.12 usec per loop 2.12 usec per loop
1 1.54 usec per loop 1.33 usec per loop
2 2.24 usec per loop 2.12 usec per loop
3 1.45 usec per loop 1.43 usec per loop
4 1.37 usec per loop 1.34 usec per loop
5 1.41 usec per loop 1.3 usec per loop
6 1.45 usec per loop 1.38 usec per loop
7 1.46 usec per loop 1.41 usec per loop

@yunzheng yunzheng marked this pull request as ready for review October 11, 2023 11:07
@yunzheng yunzheng marked this pull request as draft October 11, 2023 11:27
@yunzheng yunzheng force-pushed the improvement/datetime-fromisoformat branch from 26574bf to fdd571a Compare October 16, 2023 08:45
@yunzheng yunzheng marked this pull request as ready for review October 16, 2023 09:04
@yunzheng yunzheng requested a review from Schamper October 16, 2023 09:04
flow/record/fieldtypes/__init__.py Outdated Show resolved Hide resolved
flow/record/fieldtypes/__init__.py Outdated Show resolved Hide resolved
@yunzheng yunzheng merged commit 53c744b into main Oct 16, 2023
28 of 32 checks passed
@yunzheng yunzheng deleted the improvement/datetime-fromisoformat branch October 16, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants