Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pandas 2.0 cause query_df to return columns with different/unexpected dtypes #165

Open
georgipeev opened this issue Apr 5, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@georgipeev
Copy link

Describe the bug

Using pandas 2.0 causes query_df to return DataFrames where the columns corresponding to Date/DateTime/DateTime64 have different dtypes than those when using pandas 1.5.3 as shown in the table below:

Clickhouse column type           | query_df dtype with pandas==1.5.3 | query_df dtype with pandas==2.0
---------------------------------|-----------------------------------|--------------------------------
Date                             | datetime64[ns]                    | datetime64[s]
DateTime                         | datetime64[ns]                    | datetime64[s]
DateTime('America/Chicago')      | datetime64[ns]                    | datetime64[s]
DateTime64(6)                    | datetime64[ns]                    | datetime64[us]
DateTime64(6, 'America/Chicago') | datetime64[ns]                    | datetime64[us]

Steps to reproduce

Create a Clickhouse table with columns of the above types, insert values, query using query_df.

Expected behaviour

The dtypes of the DataFrame returned by query_df are consistent regardless of pandas version used.

Configuration

Environment

  • clickhouse-connect version: 0.5.12
  • Python version: 3.9.13
  • pandas versions: 1.5.3 and 2.0
  • Operating system: Red Hat Enterprise Linux 8

ClickHouse server

  • ClickHouse Server version: 22.3.9
@georgipeev georgipeev added the bug Something isn't working label Apr 5, 2023
@genzgd
Copy link
Collaborator

genzgd commented Apr 5, 2023

We don't currently test or support Pandas 2.0, but in fact the new datatypes are technically more correct.

@genzgd genzgd added enhancement New feature or request and removed bug Something isn't working labels Apr 5, 2023
@georgipeev
Copy link
Author

I see. Do you plan on modifying the behavior of query_df to return consistent dtypes across different versions of pandas, regardless of whether those dtypes are the old ones or the new (and more correct) ones?

@genzgd
Copy link
Collaborator

genzgd commented Apr 5, 2023

I'll have to dig into it, I'm don't know enough about the differences between Pandas versions. My first thought is that all datetime types in Pandas 1.x are given a dtype datetime[ns] (since nanoseconds is always the underlying granularity of th underlying type of a pandas Timestamp object), and that might be different in the new Pandas version. In that case I'd be inclined to keep the new and arguably better behavior.

@qqletter
Copy link

qqletter commented May 1, 2023

I'll have to dig into it, I'm don't know enough about the differences between Pandas versions. My first thought is that all datetime types in Pandas 1.x are given a dtype datetime[ns] (since nanoseconds is always the underlying granularity of th underlying type of a pandas Timestamp object), and that might be different in the new Pandas version. In that case I'd be inclined to keep the new and arguably better behavior.

The difference between v1 and v2, https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#construction-with-datetime64-or-timedelta64-dtype-with-unsupported-resolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants