-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: make dfschema wrap schemaref #9595
refactor: make dfschema wrap schemaref #9595
Conversation
Great job @haohuaijin, glad to see this near the finish line. |
I plan to merge this tomorrow unless anyone else would like time to review. Thanks again everyone |
I found it difficult to cleanup the tuple into DFField struct because there 4 kinds of combination I need to construct. I thought we only need DFFieldRef and DFField for owned. Normal case pub struct DFFieldRef<'a> {
/// Optional qualifier (usually a table or relation name)
qualifier: Option<&'a OwnedTableReference>,
/// Arrow field definition
field: &'a Field,
} Sometimes we need FieldRef to avoid the cost of clone for field. pub struct DFFieldRefWithArc<'a> {
/// Optional qualifier (usually a table or relation name)
qualifier: Option<&'a OwnedTableReference>,
/// Arrow field definition
field: &'a FieldRef,
} It is usually easier to deal with owned dffield sometimes without additional cost, because we can't avoid clone for pub struct DFField {
/// Optional qualifier (usually a table or relation name)
pub qualifier: Option<OwnedTableReference>,
/// Arrow field definition
pub field: FieldRef,
} And of course, we can have field not fieldRef, it seems better not to wrap it with Arc if possible. pub struct DFField {
/// Optional qualifier (usually a table or relation name)
pub qualifier: Option<OwnedTableReference>,
/// Arrow field definition
pub field: Field,
} |
I suggest we want until the imminent 37.0.0 release -- #9697 -- so we have additional time to test this change before releasing it to the larger community |
Since Arrow 37 is now basically released I think this PR is ready to merge. I took the liberty of merging up from main to resolve some conflicts, and once CI is complete I intend to merge this PR Thanks again @haohuaijin |
Great job getting this over the line @haohuaijin |
🚀 |
DFField was removed upstream. Ref: apache/datafusion#9595
The previous implementation relied on `DFField` which was removed upstream. Ref: apache/datafusion#9595
* chore: upgrade datafusion Deps Ref #690 * update concat and concat_ws to use datafusion_functions Moved in apache/datafusion#10089 * feat: upgrade functions.rs Upstream is continuing it's migration to UDFs. Ref apache/datafusion#10098 Ref apache/datafusion#10372 * fix ScalarUDF import * feat: remove deprecated suppors_filter_pushdown and impl supports_filters_pushdown Deprecated function removed in apache/datafusion#9923 * use `unnest_columns_with_options` instead of deprecated `unnest_column_with_option` * remove ScalarFunction wrappers These relied on upstream BuiltinScalarFunction, which are now removed. Ref apache/datafusion#10098 * update dataframe `test_describe` `null_count` was fixed upstream. Ref apache/datafusion#10260 * remove PyDFField and related methods DFField was removed upstream. Ref: apache/datafusion#9595 * bump `datafusion-python` package version to 38.0.0 * re-implement `PyExpr::column_name` The previous implementation relied on `DFField` which was removed upstream. Ref: apache/datafusion#9595
Which issue does this PR close?
Closes #4680
follow on #8905
Rationale for this change
see related issue #4680, #7944 and #8905
See also mailing list discussion: https://lists.apache.org/thread/79rpwd22bncdxbkwlq842rwr485kwwyh
What changes are included in this PR?
change the
DFSchema
and removeDFField
structfrom
to
and update the related API
Are these changes tested?
test with existing tests
Are there any user-facing changes?
New API for create DFSchema
pub fn from_unqualifed_fields(fields: Fields, metadata: HashMap<String, String>)->Result<Self>
pub fn try_from_qualified_schema<'a>(qualifier: impl Into<TableReference<'a>>,schema: &Schema,)->Result<Self>
pub fn new_with_metadata(qualified_fields: Vec<(Option<OwnedTableReference>, Arc<Field>)>,metadata: HashMap<String, String>) -> Result<Self>
pub fn from_field_specific_qualified_schema<'a>(qualifiers: Vec<Option<impl Into<TableReference<'a>>>>, schema: &SchemaRef) -> Result<Self>
New API to find qualified
Field
, the old API only returnField
pub fn qualified_field(&self, i: usize) -> (Option<&OwnedTableReference>, &Field)
: returns an immutable reference of a specificField
and its qualifier by an offsetpub fn qualified_field_with_name(&self, qualifier: Option<&TableReference>, name: &str) -> Result<(Option<&OwnedTableReference>, &Field)>
: find the qualified field with the given namepub fn qualified_fields_with_unqualified_name(&self, name: &str) -> Vec<(Option<&OwnedTableReference>, &Field)>
: Find all fields that match the given name and return them with their qualifierpub fn qualified_field_with_unqualified_name(&self,name: &str) -> Result<(Option<&OwnedTableReference>, &Field)>
: Find the qualified field with the given unqualified namepub fn qualified_field_from_column(&self,column: &Column) -> Result<(Option<&OwnedTableReference>, &Field)>
: Find the field with the given qualified columnpub fn iter(&self) -> impl Iterator<Item = (Option<&OwnedTableReference>, &FieldRef)>
: Iterate over the qualifiers and fields in the DFSchemaOther new API
pub fn columns(&self) -> Vec<Column>
: return allColumn
s for the schemapub fn columns_with_unqualified_name(&self, name: &str) -> Vec<Column>
: find all fields that match the given name and convert to column