Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issues in float parsing. #210

Merged
merged 2 commits into from
Jan 11, 2025
Merged

Fix performance issues in float parsing. #210

merged 2 commits into from
Jan 11, 2025

Conversation

Alexhuszagh
Copy link
Owner

This fixes a few performance issues due to a lack of inlining on the Eisel-Lemire algorithm.

@Alexhuszagh Alexhuszagh added the regression Performance regressions. label Jan 11, 2025
@Alexhuszagh Alexhuszagh self-assigned this Jan 11, 2025
@Alexhuszagh Alexhuszagh added the performance Related to the performance of the conversion routines. label Jan 11, 2025
@Alexhuszagh
Copy link
Owner Author

Alexhuszagh commented Jan 11, 2025

According to my benchmarks, the ones in fast-float2 against the latest commit are:

Canada

=====================================================================================
|                         canada.txt (111126, 1.93 MB, f64)                         |
|===================================================================================|
|                                                                                   |
| ns/float                min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2           10.37    10.45    10.70    10.98    11.36    13.17    26.49 |
| fast-float             9.96    10.18    10.31    10.46    11.02    12.34    16.35 |
| lexical               10.40    10.49    10.52    10.65    11.13    11.88    14.43 |
| from_str              11.56    11.72    11.84    11.96    12.48    13.12    25.68 |
|                                                                                   |
| Mfloat/s                min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2           37.75    76.05    88.03    91.09    93.46    95.73    96.46 |
| fast-float            61.16    81.04    90.72    95.57    97.02    98.23   100.40 |
| lexical               69.32    84.26    89.89    93.90    95.03    95.37    96.12 |
| from_str              38.94    76.22    80.11    83.61    84.48    85.31    86.48 |
|                                                                                   |
| MB/s                    min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2          656.91  1323.30  1531.80  1585.04  1626.36  1665.87  1678.60 |
| fast-float          1064.19  1410.26  1578.57  1663.01  1688.27  1709.31  1747.15 |
| lexical             1206.25  1466.18  1564.27  1634.06  1653.62  1659.58  1672.64 |
| from_str             677.53  1326.30  1394.09  1454.93  1470.08  1484.53  1504.86 |
|                                                                                   |
=====================================================================================

Mesh

=====================================================================================
|                          mesh.txt (73019, 0.54 MB, f64)                           |
|===================================================================================|
|                                                                                   |
| ns/float                min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2            5.02     5.13     5.22     5.45     5.91    10.35    18.15 |
| fast-float             4.83     4.94     5.01     5.17     5.31     6.20    10.26 |
| lexical                4.83     4.90     4.97     5.04     5.39     8.44    10.61 |
| from_str               5.78     5.86     5.93     6.00     6.10     6.63    10.04 |
|                                                                                   |
| Mfloat/s                min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2           55.10    97.33   169.50   183.56   191.70   194.98   199.02 |
| fast-float            97.48   163.54   188.44   193.27   199.56   202.61   206.85 |
| lexical               94.25   118.83   185.70   198.58   201.38   204.02   206.85 |
| from_str              99.62   151.05   163.90   166.71   168.56   170.84   173.15 |
|                                                                                   |
| MB/s                    min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2          404.44   714.49  1244.22  1347.43  1407.22  1431.27  1460.91 |
| fast-float           715.54  1200.47  1383.25  1418.76  1464.91  1487.26  1518.44 |
| lexical              691.89   872.27  1363.20  1457.73  1478.24  1497.65  1518.44 |
| from_str             731.25  1108.83  1203.16  1223.76  1237.32  1254.12  1271.07 |
|                                                                                   |
=====================================================================================

Random Uniform

=====================================================================================
|                           uniform (50000, 0.87 MB, f64)                           |
|===================================================================================|
|                                                                                   |
| ns/float                min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2            9.61     9.78     9.87    10.44    11.03    14.03    25.82 |
| fast-float             9.30     9.38     9.52    10.48    10.66    12.17    18.72 |
| lexical                9.44     9.54    10.01    10.78    11.03    13.55    17.52 |
| from_str              10.67    10.78    10.89    11.05    11.97    14.30    21.30 |
|                                                                                   |
| Mfloat/s                min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2           38.73    71.34    90.68    95.82   101.32   102.25   104.01 |
| fast-float            53.42    82.22    93.86    95.42   105.04   106.59   107.55 |
| lexical               57.07    73.78    90.65    92.82   100.18   104.82   105.93 |
| from_str              46.95    70.01    83.56    90.51    91.86    92.75    93.69 |
|                                                                                   |
| MB/s                    min       5%      25%   median      75%      95%      max |
|-----------------------------------------------------------------------------------|
| fast-float2          674.92  1243.06  1580.09  1669.72  1765.47  1781.72  1812.48 |
| fast-float           930.83  1432.76  1635.56  1662.71  1830.38  1857.30  1874.08 |
| lexical              994.48  1285.61  1579.52  1617.34  1745.66  1826.54  1845.89 |
| from_str             818.09  1219.91  1455.98  1577.23  1600.70  1616.14  1632.49 |
|                                                                                   |
=====================================================================================

Other benches in lexical show similar results (one uses criterion, one is a straight executable), where the performance is practically identical for fast-cast datasets, but otherwise lexical outperforms for near-halfway cases. The performance for the Earth dataset is particularly a lot faster than fast-float2, which was unexpected since it's very simple floats like to be covered by the fast case (can be represented entirely by machine floats).

@Alexhuszagh Alexhuszagh merged commit 57efda6 into main Jan 11, 2025
37 checks passed
@Alexhuszagh Alexhuszagh deleted the cold-inline branch January 11, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to the performance of the conversion routines. regression Performance regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant