You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could we treat all of them as string without specify the type.
I do see dataset from one kaggle competition has string "f" or "t" in the first N rows, but has other string somewhere after N rows. I assume it infer the schema by first N rows.
The column does-bruise-or-bleed is inferred as boolean because it has all "f" and "t" in the first N rows.
ConversionException: Conversion Error: CSV Error on Line: 27848
Original Line: 27846,p,4.24,x,t,n,d,d,,o,5.8,5.61,,t,w,,,f,f,,d,u
Error when converting column "does-bruise-or-bleed". Could not convert string "d" to 'BOOLEAN'
What version of ibis are you using?
9.3.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
This isn't an ibis bug, nor it is a bug with the DuckDB CSV reader. CSVs aren't typed, so it's always best effort to infer the correct dtypes for a given column.
You can either do what you've done above, and specify the dtypes for the columns that aren't sniffed the way you want, or you can force reading all columns as strings by passing all_varchar=True.
Thanks for the details. It is not a bug. From my side, one concern is the user experience when users switch backends. The same code cannot be used in another backend.
For text file formats without any standard for how to interpret their values (like CSV, and unlike JSON), this will always be a problem as long as people are using those formats. Humanity is unlikely to ever abandon CSV entirely, so it'll probably always be a problem.
What happened?
In this example, it reads
col2
as booleanoutput:
It looks like treat string "f" "t" as
false
ortrue
for the second column, but not the first column.I have to do something like this
output:
Could we treat all of them as string without specify the type.
I do see dataset from one kaggle competition has string "f" or "t" in the first N rows, but has other string somewhere after N rows. I assume it infer the schema by first N rows.
The column
does-bruise-or-bleed
is inferred asboolean
because it has all "f" and "t" in the first N rows.What version of ibis are you using?
9.3.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: