Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DECIMAL regex in csv reader does not accept positive exponent specifier #5648

Closed
jdcasale opened this issue Apr 15, 2024 · 2 comments · Fixed by #5649
Closed

DECIMAL regex in csv reader does not accept positive exponent specifier #5648

jdcasale opened this issue Apr 15, 2024 · 2 comments · Fixed by #5649
Labels
arrow Changes to the arrow crate bug

Comments

@jdcasale
Copy link
Contributor

jdcasale commented Apr 15, 2024

Describe the bug

Decimals in scientific notation are frequently expressed with an (admittedly unnecessary) positive exponent specifier, e.g "3.106e+04". The existing regex allows for negative exponent specifiers, but does not recognize a number with a positive specifier. This causes the parser to infer the type of any column with positive exponent specifiers as a Utf8 instead of float.

As a sanity check, I tried the same thing in DuckDB, and their csv parser does not make this error.

To Reproduce

Either attempt to infer schema for a csv file containing the offending pattern (like I have done here in this provided example) or just run the existing regex directly against the example offender: "3.106e+04", it will not match.

Expected behavior

The decimal regex recognizes "3.106e+04" as a float value, not a Utf8 string.

Additional context

@tustvold
Copy link
Contributor

It is worth noting that we only supported parsing scientific notation as of this morning - #5611

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #5649

@tustvold tustvold added the arrow Changes to the arrow crate label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants