-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38034: [Python] DataFrame Interchange Protocol - correct dtype information for categorical columns #38065
Conversation
LGTM! Just one linter issue reported.
|
Oh, bummer. Thanks for pinging me Dane! |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit db420c9. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pe information for categorical columns (apache#38065) ### Rationale for this change See: apache#38034 (comment) ### What changes are included in this PR? The `f_string` for the columns with categorical dtype is now corrected to reflect the type of the indices from the dictionary data type. Bit width has been correct before. From the spec: > For categoricals, the format string describes the type of the categorical in the data buffer. In case of a separate encoding of the categorical (e.g. an integer to string mapping), this can be derived from ``self.describe_categorical``. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38034 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>
…pe information for categorical columns (apache#38065) ### Rationale for this change See: apache#38034 (comment) ### What changes are included in this PR? The `f_string` for the columns with categorical dtype is now corrected to reflect the type of the indices from the dictionary data type. Bit width has been correct before. From the spec: > For categoricals, the format string describes the type of the categorical in the data buffer. In case of a separate encoding of the categorical (e.g. an integer to string mapping), this can be derived from ``self.describe_categorical``. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38034 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>
…pe information for categorical columns (apache#38065) ### Rationale for this change See: apache#38034 (comment) ### What changes are included in this PR? The `f_string` for the columns with categorical dtype is now corrected to reflect the type of the indices from the dictionary data type. Bit width has been correct before. From the spec: > For categoricals, the format string describes the type of the categorical in the data buffer. In case of a separate encoding of the categorical (e.g. an integer to string mapping), this can be derived from ``self.describe_categorical``. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38034 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>
Rationale for this change
See: #38034 (comment)
What changes are included in this PR?
The
f_string
for the columns with categorical dtype is now corrected to reflect the type of the indices from the dictionary data type. Bit width has been correct before. From the spec:Are these changes tested?
Yes.
Are there any user-facing changes?
No.