36.0.0 (2024-02-16)
Breaking changes:
- Deprecate make_scalar_function #8878 (viirya)
- Change
Accumulator::evaluate
andAccumulator::state
to take&mut self
#8925 (alamb) - Rename
CatalogList
toCatalogProviderList
#9002 (comphead) - Remove some recursive cloning from logical planning #9050 (ozankabak)
- Support
FixedSizeList
type coercion #8902 (Weijun-H) - Add
ColumnarValue::values_to_arrays
, deprecatecolumnar_values_to_array
#9114 (alamb)
Performance related:
- Minor: Add new Extended ClickBench benchmark queries #8950 (alamb)
Implemented enhancements:
- feat: support
stride
inarray_slice
, change indexes to be1
based #8829 (Weijun-H) - feat: emitting partial join results in
HashJoinStream
#8020 (korowa) - feat:implement sql style 'ends_with' and 'instr' string function #8862 (zy-kkk)
- feat: Support parquet bloom filter pruning for decimal128 #8930 (Ted-Jiang)
- feat: Disable client console highlight by default #9013 (comphead)
- feat: support the ergonomics of getting list slice with stride #8946 (Weijun-H)
- feat: Parallel Arrow file format reading #8897 (my-vegetable-has-exploded)
- feat: support array_reverse #9023 (Weijun-H)
- feat: issue #8969 adding position function #8988 (Lordworms)
- feat: support
LargeList
inflatten
#9110 (Weijun-H) - feat: improve
make_date
performance #9112 (r3stl355) - feat: add github action to self-assign the issue #9132 (r3stl355)
- feat: add ability to query the remote http(s) location directly in datafusion-cli #9150 (r3stl355)
- feat: implement select directly from s3 and gcs locations in datafusion-cli #9199 (r3stl355)
- feat: support block gzip for streams #9175 (tshauck)
Fixed bugs:
- fix: recursive initialize method #8937 (waynexia)
- fix: common_subexpr_eliminate rule should not apply to short-circuit expression #8928 (haohuaijin)
- fix: issue #8922 make row group test more readable #8941 (Lordworms)
- fix: allow placeholders to be substituted when coercible #8977 (kallisti-dev)
- fix: unambiguously truncate time in date_trunc function #9068 (mhilton)
- fix: schema metadata retrieval when listing parquet table #9134 (brayanjuls)
Documentation updates:
- Prepare 35.0.0-rc1 #8924 (andygrove)
- Update project links #8954 (comphead)
- Document parallelism and thread scheduling in the architecture guide #8986 (alamb)
- chore: fix license badge in README #9008 (suyanhanx)
- docs: fix array_position docs #9003 (tshauck)
- Docs: improve contributor guide to explain how to work with tickets #8999 (alamb)
- Document minimum required rust version #9071 (comphead)
- Minor: Add ParadeDB to the list of users #9018 (alamb)
- Update minimum rust version to 1.72 #8997 (alamb)
- docs: add docs and example showing how to get the expression data type #9118 (r3stl355)
- chore: Fix incorrect comment in substrait consumer #9123 (caicancai)
- Minor: Fix Self referential links in readme #9119 (alamb)
- Examples link in catalogs.rs leads to a 404 #9194 (Omega359)
- Create
datafusion-functions-array
crate and moveArrayToString
function into it #9113 (alamb)
Merged pull requests:
- Add hash_join_single_partition_threshold_rows config #8720 (maruschin)
- Prepare 35.0.0-rc1 #8924 (andygrove)
- feat: support
stride
inarray_slice
, change indexes to be1
based #8829 (Weijun-H) - fix: recursive initialize method #8937 (waynexia)
- Fix expr partial ord test #8908 (mustafasrepo)
- Simplify windows builtin functions return type #8920 (comphead)
- Fix handling of nested leaf columns in parallel parquet writer #8923 (devinjdangelo)
- feat: emitting partial join results in
HashJoinStream
#8020 (korowa) - fix: common_subexpr_eliminate rule should not apply to short-circuit expression #8928 (haohuaijin)
- Support GroupsAccumulator accumulator for udaf #8892 (guojidan)
- test: Port tests in
partitioned_csv.rs
to sqllogictest #8919 (simicd) - [CI] Fix RUSTFLAGS #8929 (Jefffrey)
- Minor: Update datafusion-cli README to explain why it is not in the w… #8938 (alamb)
- Add syntax highlight to datafusion-cli #8918 (trungda)
- Update substrait requirement from 0.22.1 to 0.23.0 #8943 (dependabot[bot])
- Deprecate make_scalar_function #8878 (viirya)
- Update project links #8954 (comphead)
- fix: issue #8922 make row group test more readable #8941 (Lordworms)
- feat:implement sql style 'ends_with' and 'instr' string function #8862 (zy-kkk)
- [MINOR]: Extract aggregate topk function to
aggregate_topk.slt
#8948 (mustafasrepo) - Combine multiple
IN
lists inExprSimplifier
#8949 (jayzhan211) - Fix clippy failures: error: use of deprecated function `functions::make_scalar_function #8972 (alamb)
- feat: Support parquet bloom filter pruning for decimal128 #8930 (Ted-Jiang)
- [MINOR]: Update create_window_expr to refer only input schema #8945 (mustafasrepo)
- Don't error in simplify_expressions rule #8957 (haohuaijin)
- Use .zip to avoid unwrap #8956 (Luv-Ray)
- Change
Accumulator::evaluate
andAccumulator::state
to take&mut self
#8925 (alamb) - Enhance simplifier by adding Canonicalize #8780 (yyy1000)
- Find the correct fields when using page filter on
struct
fields in parquet #8848 (manoj-inukolunu) - fix: allow placeholders to be substituted when coercible #8977 (kallisti-dev)
- Minor: improve CatalogProvider documentation with rationale and info about remote catalogs #8968 (alamb)
- Improve to_timestamp docs #8981 (Omega359)
- Add helper function for processing scalar function input #8962 (viirya)
- Fix optimize projections bug #8960 (mustafasrepo)
- NOT operator not return internal error when args are not boolean value #8982 (guojidan)
- Minor: Add new Extended ClickBench benchmark queries #8950 (alamb)
- Minor: Add comments to MSRV CI check to help if it fails #8995 (alamb)
- Minor: Document memory management design on
MemoryPool
#8966 (alamb) - Fix LEAD/LAG window functions when default value null #8989 (comphead)
- Optimize MIN/MAX when relation is empty #8940 (viirya)
- [task #8203] Port tests in joins.rs to sqllogictest #8996 (Tangruilin)
- [task #8213]Port tests in select.rs to sqllogictest #8967 (Tangruilin)
- test: Port (last)
repartition.rs
query to sqllogictest #8936 (simicd) - Update to sqlparser
0.42.0
#9000 (alamb) - [MINOR]: Fix Optimize Projections Bug #8992 (mustafasrepo)
- Make Topk aggregate tests deterministic #8998 (mustafasrepo)
- Add support for Postgres LIKE operators #8894 (gruuya)
- bug: Datafusion doesn't respect case sensitive table references #8964 (xhwhis)
- Document parallelism and thread scheduling in the architecture guide #8986 (alamb)
- Fix None Projections in Projection Pushdown #9005 (berkaysynnada)
- Lead and Lag window functions should support default value with datatype other than Int64 #9001 (viirya)
- chore: fix license badge in README #9008 (suyanhanx)
- Minor: fix: #9010 - Optimizer schema change assert error is incorrect #9012 (curtisleefulton)
- docs: fix array_position docs #9003 (tshauck)
- Rename
CatalogList
toCatalogProviderList
#9002 (comphead) - Safeguard against potential inexact row count being smaller than exact null count #9007 (gruuya)
- Recursive CTEs: Stage 3 - add execution support #8840 (matthewgapp)
- sqllogictest: move the creation of the nan_table from Rust to slt #9022 (jonahgao)
- TreeNode refactor code deduplication: Part 3 #8817 (ozankabak)
- feat: Disable client console highlight by default #9013 (comphead)
- [task #8917] Implement information_schema.schemata #8993 (Tangruilin)
- Properly encode STRING_AGG, NTH_VALUE in physical plan protobufs #9027 (scsmithr)
- [task #8201] Port tests in expr.rs to sqllogictest, finish the left c… #9014 (Tangruilin)
- Fix the clippy error of use of deprecated method #9034 (viirya)
- feat: support the ergonomics of getting list slice with stride #8946 (Weijun-H)
- Cache common referred expression at the window input #9009 (mustafasrepo)
- Optimize
COUNT( DISTINCT ...)
for strings (up to 9x faster) #8849 (jayzhan211) - feat: Parallel Arrow file format reading #8897 (my-vegetable-has-exploded)
- Change remove from swap to shift in index map #9049 (mustafasrepo)
- Relax join keys constraint from Column to any physical expression for physical join operators #8991 (viirya)
- Minor: Improve memory helper trait documentation #9025 (alamb)
- Docs: improve contributor guide to explain how to work with tickets #8999 (alamb)
- fix issue where upper and lower functions only work correctly on ascii character #9054 (Omega359)
- Minor: small updates to bench.sh #9035 (kmitchener)
- Chore: explicitly list out all Expr types in TypeCoercionRewriter::mutate #9038 (guojidan)
- Minor: improve scalar functions document #9029 (Weijun-H)
- [MINOR] Alter a SHJ test for relaxing "on" condition #9065 (metesynnada)
- Remove some recursive cloning from logical planning #9050 (ozankabak)
- minor: remove useless macro #8979 (jackwener)
- Causality Analysis for Builtin Window Functions #9048 (mustafasrepo)
- Minor: add doc examples for RawTableAllocExt #9059 (alamb)
- Update substrait requirement from 0.23.0 to 0.24.0 #9067 (dependabot[bot])
- Remove single_file_output option from FileSinkConfig and Copy statement #9041 (yyy1000)
- Add a make_date function #9040 (Omega359)
- Speedup
DFSchema::merge
using HashSet indices #9020 (simonvandel) - Document minimum required rust version #9071 (comphead)
- Return proper number of expressions for nth_value_agg #9044 (mustafasrepo)
- ScalarUDF with zero arguments should be provided with one null array as parameter #9031 (viirya)
- Update strum requirement from 0.25.0 to 0.26.1 #9046 (dependabot[bot])
- Create
datafusion-functions
crate, extract encode and decode to #8705 (alamb) - Add documentation for streaming usecase #9070 (mustafasrepo)
- fix: unambiguously truncate time in date_trunc function #9068 (mhilton)
- feat: support array_reverse #9023 (Weijun-H)
- prettier to_timestamp_invoke #9078 (Tangruilin)
- Handle invalid types for negation #9066 (trungda)
- Minor: reduce unwraps in datetime_expressions.rs #9072 (alamb)
- Remove custom doubling strategy + add examples to
VecAllocEx
#9058 (alamb) - Split physical_plan_tpch into separate benchmarks #9043 (simonvandel)
- Minor: Add ParadeDB to the list of users #9018 (alamb)
- [MINOR]: Add check for unnecessary projection #9079 (mustafasrepo)
- chore(placeholder): update error message and add tests #9073 (appletreeisyellow)
- refer to #8781, convert the internal_err! in datetime_expression.rs to exec_err! #9083 (Tangruilin)
- Add benchmarks for to_timestamp and make_date functions #9086 (Omega359)
- chore: Clarify ParadeDB branding #9088 (philippemnoel)
- doc: Add example how to include latest datafusion #9076 (comphead)
- Update minimum rust version to 1.72 #8997 (alamb)
- Fix typo in an error message #9099 (AdamGS)
- Update InfluxDB links in Known Users section of documentation #9092 (alamb)
- Support
FixedSizeList
type coercion #8902 (Weijun-H) - Improve Canonicalize API #8983 (alamb)
- Update env_logger requirement from 0.10 to 0.11 #8944 (dependabot[bot])
- Split count_distinct.rs into separate modules #9087 (alamb)
- Fix update_expr for projection pushdown #9096 (viirya)
- Improve
InListSImplifier
-- add test, commend and avoid clones #8971 (alamb) - feat: issue #8969 adding position function #8988 (Lordworms)
- Cleanup regex_expressions.rs to remove _regexp_match function #9107 (Omega359)
- Unnest with single expression #9069 (jayzhan211)
- Minor: improve GroupsAccumulator and Accumulator documentation #8963 (alamb)
- move InList related simplify to one place #9037 (guojidan)
- docs: add docs and example showing how to get the expression data type #9118 (r3stl355)
- Add http(s) support to the command line #8753 (kcolford)
- Remove External Table Backwards Compatibility Options #9105 (yyy1000)
- feat: support
LargeList
inflatten
#9110 (Weijun-H) - feat: improve
make_date
performance #9112 (r3stl355) - Refactor min/max value update in Parquet statistics #9120 (Weijun-H)
- chore: Fix incorrect comment in substrait consumer #9123 (caicancai)
- Minor: Fix Self referential links in readme #9119 (alamb)
- Add
ColumnarValue::values_to_arrays
, deprecatecolumnar_values_to_array
#9114 (alamb) - Support Copy with Remote Object Stores in datafusion-cli #9064 (manoj-inukolunu)
- Fix Dockerfile min rust version to 1.72 #9135 (alamb)
- fix: schema metadata retrieval when listing parquet table #9134 (brayanjuls)
- Update parse_protobuf_file_scan_config to remove any partition columns from the file_schema in FileScanConfig #9126 (bcmcmill)
- feat: add github action to self-assign the issue #9132 (r3stl355)
- Fix NULL values in FixedSizeList creation #9141 (Weijun-H)
- Add
FunctionRegistry::register_udaf
andFunctionRegistry::register_udwf
#9075 (alamb) - Change ScalarValue::Struct to ArrayRef #7893 (jayzhan211)
- Support join filter for
SortMergeJoin
#9080 (viirya) - Typo in docstring #9149 (tv42)
- RecordBatchReceiverStreamBuilder: don't stringify errors #9155 (tv42)
- port position test to scalar #9128 (Lordworms)
- Minor: Improve
DataFrame
docs, add examples #9159 (alamb) - feat: add ability to query the remote http(s) location directly in datafusion-cli #9150 (r3stl355)
- Add
regexp_like, improve docs and examples for
regexp_match` #9137 (Omega359) - Partial Sort Plan Implementation #9125 (ahmetenis)
- Update tonic requirement from 0.10 to 0.11 #9176 (dependabot[bot])
- minor: fix error message function naming #9168 (comphead)
- Minor: Update
DataFrame::write_table
docs #9169 (alamb) - Improve PhysicalExpr documentation #9180 (alamb)
- Fix sphinx warnings #9142 (ongchi)
- Use concat to simplify Nested Scalar creation #9174 (jayzhan211)
- Minor: Remove unecessary map_err #9186 (alamb)
- Add example of using
PruningPredicate
to datafusion-examples #9183 (alamb) - Use prep_null_mask_filter to handle nulls in selection mask #9163 (viirya)
- [Document] Adding UDF by impl ScalarUDFImpl #9172 (yyy1000)
- Docs: Extend
PruningPredicate
with background and implementation info #9184 (alamb) - chore: make tokio a workspace dependency #9187 (PsiACE)
- Examples link in catalogs.rs leads to a 404 #9194 (Omega359)
- Add test pipeline for Mac aarch64 #9191 (viirya)
- Add string aggregate grouping fuzz test, add
MemTable::with_sort_exprs
#9190 (alamb) - Create
datafusion-functions-array
crate and moveArrayToString
function into it #9113 (alamb) - Add constant expression support to equivalence properties #9198 (mustafasrepo)
- chore: update tpch-docker docker repository #9204 (pmcgleenon)
- feat: implement select directly from s3 and gcs locations in datafusion-cli #9199 (r3stl355)
- MINOR: Add "fs" feature to "tokio", fix "features" typo. #9210 (mustafasrepo)
- Add
to_char
function implementation using chrono formats #9181 (Omega359) - Add
SessionContext::read_batches
#9197 (Lordworms) - feat: support block gzip for streams #9175 (tshauck)
- chore(pruning): Support
IS NOT NULL
predicates inPruningPredicate
#9208 (appletreeisyellow) - Add cargo audit CI #9182 (ongchi)
- Move
nullif
andisnan
to datafusion-functions #9216 (alamb) - Bugfix - Projection Removal Conditions #9215 (berkaysynnada)
- Partitioning fixes #9207 (esheppa)
- Return an error when a column does not exist in window function #9202 (PhVHoang)
- Revert "chore(pruning): Support
IS NOT NULL
predicates inPruningPredicate
(#9208)" #9232 (appletreeisyellow) - Improve documentation on how to build
ScalarValue::Struct
and addScalarStructBuilder
#9229 (alamb) - Minor: improve Display of output ordering of
StreamTableExec
#9225 (mustafasrepo) - Support compute return types from argument values (not just their DataTypes) #8985 (yyy1000)
- Dont call multiunzip when no stats #9220 (matthewmturner)
- Use setup-macos-aarch64-builder for aarch64 CI pipeline #9242 (viirya)
- GROUP-BY prioritizes input columns in case of ambiguity #9228 (jonahgao)
- Minor: chore: improve catalog test in mod.rs #9244 (caicancai)
- Add example for
ScalarStructBuilder::new_null
, fix display fornull
ScalarValue::Struct
#9238 (alamb)