All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
In this release, the segment_anything function has been refactored and cleaned up for improved performance and maintainability. The output of segment_anything has also been modified to return the mask and iou_score. Additionally, support for reading CSV files from HTTP sources has been added, along with basic S3 support, enhancing the data ingestion capabilities of the project.
- Update dependency ruff to v0.6.3 by @renovate[bot] in #242
- Refactor and clean segment anything function by @mesejo in #243
- Reading from csv in HTTP, add basic s3 support by @mesejo in #230
- Change output of segment_anything to mask and iou_score by @mesejo in #244
- Bump quinn-proto from 0.11.6 to 0.11.8 by @dependabot[bot] in #249
- Update actions/create-github-app-token action to v1.10.4 by @renovate[bot] in #253
- Bump cryptography from 43.0.0 to 43.0.1 by @dependabot[bot] in #254
This update includes new workflows for testing Snowflake and S3, a dependency update for ruff, and several fixes addressing PyPI release issues, in-memory table registration, and Dask version compatibility.
- Add workflow for testing snowflake by @mesejo in #233
- Add ci workflow for testing s3 by @mesejo in #235
- Update dependency ruff to v0.6.2 by @renovate[bot] in #229
- Issues with release to pypi by @mesejo in #228
- Registration of in-memory tables by @mesejo in #232
- Improve snowflake workflow by @mesejo in #234
- Checkout PR ref by @mesejo in #236
- Fix dask version by @mesejo in #237
The library has seen a lot of active development, with numerous new features and improvements added in various pull requests:
- New functionality, such as a pyarrow-based UDAF, postgres and sqlite readers, image/array manipulation functions, and xgboost prediction functions, have been added.
- Existing functionality has been enhanced by wrapping ibis backends, updating dependencies, and improving the build/testing process.
- Numerous dependency updates have been made to keep the library up-to-date.
- Some bug fixes and stability improvements have been implemented as well.
- Add pyarrow udaf based on PyAggregator by @mesejo in #108
- Add unit tests based on workflow diagram by @mesejo in #110
- Add postgres read_parquet by @mesejo in #118
- Add wrapper for snowflake backend by @mesejo in #119
- Add read_sqlite and read_postgres by @mesejo in #120
- Add ibis udf and model registration method by @hussainsultan in #182
- Add udf signature and return a partial with model_name by @hussainsultan in #195
- Add image and array manipulation functions by @mesejo in #181
- Add example predict_xgb.py by @dlovell in #213
- Add connectors for using environment variables or fixed examples server by @dlovell in #217
- Add workflow for testing library only dependencies by @mesejo in #223
- Add duckdb and xgboost as dependencies for examples by @mesejo in #216
- Wrap ibis backends by @mesejo in #115
- Unpin pyarrow version by @mesejo in #121
- Update README by @mesejo in #125
- Use options.backend as ParquetCacheStorage's default backend by @mesejo in #123
- Change to publish on release by @mesejo in #122
- Configure Renovate by @renovate[bot] in #124
- Update dependency black to v24 [security] by @renovate[bot] in #126
- Update dependency pure-eval to v0.2.3 by @renovate[bot] in #130
- Update dependency blackdoc to v0.3.9 by @renovate[bot] in #128
- Update dependency pytest to v7.4.4 by @renovate[bot] in #131
- Update actions/create-github-app-token action to v1.10.3 by @renovate[bot] in #127
- Update dependency connectorx to v0.3.3 by @renovate[bot] in #129
- Update dependency snowflake/snowflake-connector-python to v3.11.0 by @renovate[bot] in #141
- Update dependency importlib-metadata to v8.1.0 by @renovate[bot] in #139
- Update dependency ruff to v0.5.4 by @renovate[bot] in #133
- Update dependency black to v24.4.2 by @renovate[bot] in #136
- Update dependency sqlalchemy to v2.0.31 by @renovate[bot] in #134
- Update codecov/codecov-action action to v4.5.0 by @renovate[bot] in #135
- Update dependency codespell to v2.3.0 by @renovate[bot] in #137
- Update dependency coverage to v7.6.0 by @renovate[bot] in #138
- Update dependency sqlglot to v23.17.0 by @renovate[bot] in #142
- Update dependency pre-commit to v3.7.1 by @renovate[bot] in #140
- Update dependency structlog to v24.4.0 by @renovate[bot] in #143
- Update actions/checkout action to v4 by @renovate[bot] in #148
- Update actions/setup-python action to v5 by @renovate[bot] in #149
- Update dependency datafusion/datafusion to v39 by @renovate[bot] in #150
- Update dependency numpy to v2 by @renovate[bot] in #152
- Update dependency duckb/duckdb to v1 by @renovate[bot] in #151
- Update dependency pyarrow to v17 by @renovate[bot] in #153
- Disable pip_requirements manager by @mesejo in #163
- Update dependency pytest-cov to v5 by @renovate[bot] in #159
- Update extractions/setup-just action to v2 by @renovate[bot] in #161
- Update github artifact actions to v4 by @renovate[bot] in #162
- Range for datafusion-common by @renovate[bot] in #166
- Update dependency pytest to v8 by @renovate[bot] in #158
- Update dependencies ranges by @mesejo in #172
- Enable plugin development for backends by @mesejo in #132
- Include pre-commit dependencies in renovatebot scan by @mesejo in #176
- Update dependency ruff to v0.5.5 by @renovate[bot] in #174
- Bump object_store from 0.10.1 to 0.10.2 by @dependabot[bot] in #175
- Update dependency pre-commit to v3.8.0 by @renovate[bot] in #178
- Lock file maintenance, update Cargo TOML by @renovate[bot] in #179
- Refactor flake by @dlovell in #180
- Use poetry2nix overlays by @dlovell
- Enable editable install by @dlovell
- Update dependency ruff to v0.5.6 by @renovate[bot] in #183
- Update dependency coverage to v7.6.1 by @renovate[bot] in #187
- Lock file maintenance by @renovate[bot] in #188
- Collapse ifs by @dlovell
- Enable
nix run
to drop into an ipython shell by @dlovell - Make key_prefix settable in config/CacheStorage by @dlovell in #196
- Update dependency ruff to v0.5.7 by @renovate[bot] in #197
- Bump aiohttp from 3.9.5 to 3.10.2 by @dependabot[bot] in #212
- Lock file maintenance by @renovate[bot] in #207
- Return wrapper with model_name partialized by @hussainsultan
- Update links to data files by @mesejo in #214
- Update dependency ruff to v0.6.0 by @renovate[bot] in #215
- Update gbdt-rs repo url by @mesejo in #220
- Make gbdt-rs dependency unambiguous by @mesejo in #222
- Use postgres.connect_examples() and TemporaryDirectory by @mesejo in #219
- Update dependency ruff to v0.6.1 by @renovate[bot] in #218
- Register cache tables when executing to_pyarrow by @mesejo in #114
- Update dependency fsspec to v2024.6.1 by @renovate[bot] in #144
- Update rust crate pyo3 to 0.21 by @renovate[bot] in #146
- Update tokio-prost monorepo to 0.13.1 by @renovate[bot] in #147
- Update rust crate datafusion range to v40 by @renovate[bot] in #165
- Update rust crate datafusion-* to v40 by @renovate[bot] in #167
- Widen dependency dask range to v2024 by @renovate[bot] in #164
- Enable build on macos by @dlovell
- Conditionally include libiconv in maturinOverride by @dlovell
- Update dependency attrs to v24 by @renovate[bot] in #185
- Return proper type in get_log_path by @dlovell
- Use pandas backend in SourceStorage by @mesejo
- Update rust crate datafusion to v41 by @renovate[bot] in #203
- Remove warnings and deprecated palmerpenguins package by @mesejo in #113
- Remove so that the udf keeps its metadata by @hussainsultan in #198
- @renovate[bot] made their first contribution in #218
- @dependabot[bot] made their first contribution in #212
- Api letsql api methods by @mesejo in #105
- Prepare for release 0.1.4 by @mesejo in #107
- 0.1.4 by @mesejo in #109
- Add docker start to ci-test by @mesejo
- Poetry: add poetry checks to .pre-commit-config.yaml by @dlovell
- Add source cache by default by @mesejo
- Test_cache: add test_parquet_cache_storage by @dlovell
- Add rust files by @dlovell
- Add new cases to DataFusionBackend.register by @dlovell
- Add client tests for new register types by @dlovell
- Add faster function for CachedNode removal by @mesejo
- Add optimizations for predict_xgb in datafusion by @mesejo in #16
- Lint: add args to poetry pre-commit invocation by @dlovell in #20
- Add TableProvider for ibis Table by @mesejo in #21
- Add filter pushdown for ibis.Table TableProvider by @mesejo in #24
- Add .sql implementation by @mesejo in #28
- Add automatic testing for examples dir by @mesejo in #45
- Add docs by @mesejo in #51
- Add better snowflake caching by @dlovell in #49
- Add docs-preview workflow by @mesejo in #54
- Add missing extras to poetry install in docs workflow by @mesejo in #58
- Add start of services to workflow by @mesejo in #59
- Add docs deploy workflow by @mesejo in #55
- Add array functions by @mesejo in #60
- Add registering of arbitrary expressions by @mesejo in #64
- Add generic functions by @mesejo in #66
- Add hashing of duckdb parquet files by @mesejo in #67
- Add numeric functions by @mesejo in #80
- Add
ls
accessor for Expr by @dlovell in #81 - Add greatest and least functions by @mesejo in #98
- Add temporal functions by @mesejo in #99
- Add StructColumn and StructField ops by @mesejo in #102
- Add SnapshotStorage by @dlovell in #103
- Improve performance and ux of predict_xgb by @mesejo
- Improve performance and ux of predict_xgb by @mesejo in #8
- Fetch only the required features for the model by @mesejo
- Fetch only the required features for the model by @mesejo in #9
- Organize the letsql package by @mesejo
- Lint by @dlovell
- Define CacheStorage with deterministic hashing for keys by @mesejo
- Define KEY_PREFIX to identify letsql cache by @dlovell
- Conftest: define expected_tables, enforce test fixture table list by @dlovell
- Lint by @dlovell
- Update poetry.lock by @dlovell
- Enable registration of pyarrow.RecordBatchReader and ir.Expr by @mesejo in #13
- Update CONTRIBUTING.md with instructions to run Postgres by @mesejo
- Register more dask normalize_token types by @dlovell in #17
- Enable flake to work on both linux and macos by @dlovell in #18
- Clean up development and ci/cd workflows by @mesejo in #19
- Temporal readme by @mesejo
- Publish test coverage by @mesejo in #31
- Update project files README, CHANGELOG and pyproject.toml by @mesejo in #30
- Expose TableProvider trait in Python by @mesejo in #29
- Clear warnings, bump up datafusion version to 37.1.0 by @mesejo in #33
- Update ibis version by @mesejo in #34
- Xgboost is being deprecated by @hussainsultan in #40
- Drop connection handling by @mesejo in #36
- Refactor _register_and_transform_cache_tables by @mesejo in #44
- Improve postgres table caching / cache invalidation by @dlovell in #47
- Make engines optional extras by @dlovell in #50
- SourceStorage: special case for cross-source caching by @dlovell in #63
- Problem with multi-engine execution by @mesejo in #70
- Clean test_execute and move tests from test_isolated_execution by @mesejo in #79
- Move cache related tests to test_cache.py by @mesejo in #88
- Give ParquetCacheStorage a default path by @dlovell in #92
- Update to datafusion version 39.0.0 by @mesejo in #97
- Make cache default path configurable by @mesejo in #101
- V0.1.3 by @mesejo in #106
- Filter bug solved by @mesejo
- Set stable ibis dependency by @mesejo
- Failing ci by @mesejo
- Pyproject: specify rev when using git ref, don't use [email protected] by @dlovell
- Pyproject: make pyarrow,datafusion core dependencies by @dlovell
- Run
poetry lock --no-update
by @dlovell - Use _load_into_cache in _put by @mesejo
- _cached: special case for name == "datafusion" by @dlovell
- ParquetCacheStorage: properly create cache dir by @dlovell
- Local cache with parquet storage by @mesejo
- Fix mac build with missing source files by @hussainsultan
- Allow for multiple execution of letsql tables by @mesejo in #41
- Fix import order using ruff by @mesejo in #37
- Mismatched table names causing table not found error by @mesejo in #43
- Ensure nonnull-ability of columns works by @dlovell in #53
- Explicitly install poetry-plugin-export per warning message by @dlovell in #61
- Update make_native_op to replace tables by @mesejo in #75
- Normalize_memory_databasetable: incrementally tokenize RecordBatchs by @dlovell in #73
- Cannot create table by @mesejo in #74
- Handle case of table names during con.register by @mesejo in #77
- Use sqlglot to generate escaped name string by @dlovell in #85
- Register table on caching nodes by @mesejo in #87
- Ensure snowflake tables have their Namespace bound on creation by @dlovell in #91
- Change name of parameter in replace_table function by @mesejo in #94
- Return native_dts, not sources by @dlovell in #95
- Displace offsets in TimestampBucket by @mesejo in #104
- Pyproject: remove redundant and conflicting dependency specifications by @dlovell
- Remove macos test suite by @mesejo
- Remove optimizer.py by @mesejo in #14
- Remove redundant item setting _sources on registering the cache nodes by @mesejo in #90
- Add missing dependencies by @mesejo
- Add CONTRIBUTING.md
- Address problems with schema
- Nix: add flake.nix and related files by @dlovell
- Add db package for showing predict udf working by @mesejo
- Add db package for showing predict udf working by @mesejo in #1
- Remove xgboost as dependency by @mesejo
- Add register and client functions
- Add testing of api
- Add isnan/isinf and fix offset
- Add udf support
- Add new string ops, remove typo
- Test array, temporal, string and udf
- Start adding wrapper
- Prepare for release