-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make format prefix optional for format options in COPY #9723
Conversation
options.insert(renamed_key.to_lowercase(), value_string.to_lowercase()); | ||
} else { | ||
options.insert(key.to_lowercase(), value_string.to_lowercase()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for @devinjdangelo
In the previous similar PR, we were only renaming keys when the file type matches JSON, CSV or PARQUET.
See https://github.com/apache/arrow-datafusion/pull/9594/files#diff-bb98aeaec478d9576e33911f252745961f57e38d25f6e9d13803894dfbad25d5R876-R881
I decided to not do this now since it'd seem weird if some file types need a format
prefix & others don't.
Let me know if you think differently.
Sidenote: Other file formats don't have any supported format options currently.
1 test is failing on CI for multiple test suites. Wasn't able to reproduce the failure locally. |
I think it is due to the new Rust release, #9724 (which has since been fixed) |
I merged up to main to get the CI passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tinfoil-knight -- this PR is most appreciated and seems to do exactly what it says.
I pushed one additional commit to verify that the error message for unknown cases is reasonable and another to change the comments in the test that avoiding the .format
is desired
cc @devinjdangelo @ozankabak @metesynnada -- I think this PR is quite nice and elegantly supports a nicer UX while retaining the unified configuration support
I plan to merge this PR later today unless anyone would like additional time to review |
Thanks again @tinfoil-knight |
// compatibility. | ||
|
||
let renamed_key = format!("format.{}", key); | ||
options.insert(renamed_key.to_lowercase(), value_string.to_lowercase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tinfoil-knight @alamb, just would like to know the history of why we would make the value string lower case, I am encountering this case issue in another PR as described https://github.com/apache/datafusion/pull/10792/files#r1632877093. Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was originally added in this PR: #9382
It seems like the author of that PR was trying to consolidate case changes (using to_lowercase
) in multiple places to just a single place.
For now, you could add an exclusion for the access token option that is case sensitive in your PR so you can continue your work.
Ideally, we should probably stop lower-casing the options and handle case-sensitivity in the specific reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having keys lowercased makes sense for standardization purposes, but I'm not sure why we are also lowercasing option values. It may either be because (1) of the same standardization concerns, or (2) by mistake.
The most flexible way would be to make value (not key) standardization logic customizable, with the default being lowercase. If we had that, users like @xinlifoobar would be able to avoid standardization modifications. Other users who want other kinds of standardizations would be able to override it etc.
@tinfoil-knight, would you be interested in thinking about how we can do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened an issue here: #10853
Making the value standardization logic customizable makes sense. Integrations should receive the original value and be able to decide what to do with it.
I'll attempt a fix for the issue this weekend if it hasn't already been picked up by then.
Which issue does this PR close?
Closes #9716 .
Rationale for this change
Before this PR
User has to prefix all format options they pass to the
COPY
command withformat
. For eg:This is repetitive & different from what Datafusion 36.0.0 offered. (See issue description for more details)
After this PR
User can use the format options for the
COPY
command without prefixing them withformat
:What changes are included in this PR?
We're prefixing keys passed in options with
format
if they don't contain a namespace (signified by presence of.
).Are these changes tested?
Yes. New tests are added to verify that options not prefixed by
format
work.Are there any user-facing changes?
Current docs have the legacy format so updates aren't needed.