-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error trying to convert Parquet date
into Pandas datatype
#181
Comments
Hi @gallamine! I can reproduce the error message in your code (in the newest version of I tried to reproduce with a parquet file: import pandas as pd
from datetime import datetime
df = pd.DataFrame({"date_col": [datetime.today()]})
df.to_parquet("181.parquet")
import dask.dataframe as dd
from dask_sql import Context
df = dd.read_parquet("181.parquet")
c = Context()
c.create_table("df", df)
print(c.sql("SELECT *, CAST(date_col AS DATE), EXTRACT(DOW FROM date_col) FROM df").compute()) but that works. Relating to your other open issue #179, I just assumed that this issue came up during interaction with hive - and especially during interaction with a partitioned hive table? Looking back to the code of the hive input, I am wondering if the call to If you want, I can add try to reproduce it using my local hive setup and come up with a bugfix. If you have already a setup for testing this (and my assumption is correct that your bug came up during hive usage), I am also happy to work together with you! |
I'm testing your suggested fix now. I was OOO the past few days and (sadly) not attending the Dask Summit |
When the Parquet file contains a
date
type dask_sql will try and convert the corresponding Pandas dataframe column into adate
type that Pandas doesn't recognize. The issue seems to arise in https://github.com/nils-braun/dask-sql/blob/main/dask_sql/mappings.py#L273. Example:results in
TypeError: data type 'date' not understood
.This is after PyArrow has decided that the Parquet
date
time should be adatetime64[ns]
type in Pandas.I'm using Python 3.7, pyarrow==4.0.0 and pandas==1.2.4
The text was updated successfully, but these errors were encountered: