-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing types in sink_csv
/write_csv
/read_csv
/scan_csv
.
#11600
Comments
Intended solutionAll of these parameters (except So the Python signature for these should be read_csv("test.csv", separator=b"日")
# SyntaxError: bytes can only contain ASCII literal characters In PyO3, the corresponding type is fn read_csv(
separator: &[u8],
..
) {
let separator: u8 = parse_single_byte_input(separator); // util that asserts length 1 and extracts the single byte
...
} The Rust core function should accept .with_separator(b';') Current state & next stepsThe Rust side already uses PyO3The PyO3 bindings are inconsistent. In some places the type is This can be done in a non-breaking way by converting the Python string input to bytes before passing it to PyO3 (we should write a small util for this): separator = bytes(separator, "ascii") PythonPython inputs are now strings. This should be changed to bytes. This would be a breaking change. We can ease the transition by accepting both (if input is string, we convert to bytes using the util specified above) for a while. ImpactThe result will be that our type system better reflects that these parameters accept a single byte as input. The only drawback I see is that Python users may be used to specifying string inputs, e.g. Therefore, we could consider simply accepting both string and bytes, and converting to bytes before passing to PyO3. |
Additional thoughts:
And I'm guessing the same goes for all other mentioned. |
This 😉✅💯 We can handle the conversion seamlessly for the caller here without being unnecessarily nitpicky (and while still raising good/clear errors, etc). They shouldn't really be made to think about what the Rust core is doing with types (vs Python) at quite such a low-level. |
I had thought about this as well. While currently we support only single-byte separators, we might want to expand this in the future. At that point, we'd have to accept strings everywhere instead of bytes (both in Python and Rust). And we'd probably have to rename some of our parameters ( But let's worry about that when we get to it? |
@svaningelgem Do you want to finish this one or should I pick this up? |
Please feel free. I'm looking into 2 other things right now:
Especially the first one is pretty frustrating, so I'm now picking up the 2nd and afterwards will start a crash course on rust so I have a bit more basis to continue from. Now it's just an endless trial and error... |
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
As was being discussed in PR #11583 between @svaningelgem & @stijndegooijer :
This "bug" report is to raise the issue and come to a solution (which might be to not do anything of course 😄 ...
What should we do about the types of:
My personal opinion would be that on the polars side, this should accept what it should be: a single character for the last 3.
And a single char for the eol_char, and a string for the lineterminator.
So my proposal would be to change these like:
Installed versions
branch: main
The text was updated successfully, but these errors were encountered: