You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decimals in scientific notation are frequently expressed with an (admittedly unnecessary) positive exponent specifier, e.g "3.106e+04". The existing regex allows for negative exponent specifiers, but does not recognize a number with a positive specifier. This causes the parser to infer the type of any column with positive exponent specifiers as a Utf8 instead of float.
As a sanity check, I tried the same thing in DuckDB, and their csv parser does not make this error.
To Reproduce
Either attempt to infer schema for a csv file containing the offending pattern (like I have done here in this provided example) or just run the existing regex directly against the example offender: "3.106e+04", it will not match.
Expected behavior
The decimal regex recognizes "3.106e+04" as a float value, not a Utf8 string.
Additional context
The text was updated successfully, but these errors were encountered:
Describe the bug
Decimals in scientific notation are frequently expressed with an (admittedly unnecessary) positive exponent specifier, e.g "3.106e+04". The existing regex allows for negative exponent specifiers, but does not recognize a number with a positive specifier. This causes the parser to infer the type of any column with positive exponent specifiers as a Utf8 instead of float.
As a sanity check, I tried the same thing in DuckDB, and their csv parser does not make this error.
To Reproduce
Either attempt to infer schema for a csv file containing the offending pattern (like I have done here in this provided example) or just run the existing regex directly against the example offender: "3.106e+04", it will not match.
Expected behavior
The decimal regex recognizes "3.106e+04" as a float value, not a Utf8 string.
Additional context
The text was updated successfully, but these errors were encountered: