Releases: dlt-hub/dlt
0.3.12
Core Library
In this version we release two new types of a destinations:
- Add a Weaviate destination by @burnash in #479
A vector data store: load and query vectorized text data - Basic AWS Athena support by @sh-rp in #522
A data lake destination which works together withfilesystem
as a staging
Apart from that bug fixes:
- fixes airflow provider init sequence by @rudolfix in #569
- fixes transformer decorator typings by @rudolfix in #554
Docs
- We improved documentation for many verified sources (thx @dat-a-man and @AstrakhantsevaAA )
- updates contribution and readme + small docs fixes by @rudolfix in #553
- Edit weaviate docs by @hsm207 in #566
New Contributors
Full Changelog: 0.3.10...0.3.12
0.3.10
Core Library
- Fix config dataclasses on python 3.11 by @steinitzu in #541
Now P3.11 is fully tested on CI - removes optional dependencies by @rudolfix in #552
sentry-sdk
and several dependencies used bydlt deploy
command were moved to extras. several others (includingfsspec
) have their minimal version set to earlier versions - PR above is also fixing #539 and #540
Full Changelog: 0.3.9...0.3.10
0.3.9
Bugfix Release
When a replace with staging dataset was used in version 0.3.8, tables with other write dispositions were also truncated (in other words all the tables in the schema could be truncated). Note that default replace strategy does not use staging dataset so if you didn't explicitly changed you were not affected.
This release fixes that bug. If you use the replace strategy above, update the library.
Full Changelog: 0.3.8...0.3.9
0.3.8
Core Library
-
use Airflow (and possibly other) schedulers with dlt resources by @rudolfix in #534
A really cool feature that allows your incremental loading to take date ranges from Airflow schedulers. Do backfilling, incremental loading and relay on Airflow to keep the pipeline state. -
Ignore hints prefixed with 'x-' in table_schema() by @burnash in #525
-
Now our CI works correctly from forks! by @steinitzu in #530
Support for unstructured data!
A really cool data source that let's you ask questions about your PDF documents and stores the answers in any of our destinations. Going from binary blobs through unstrucutred.io, vector databases and LLM queries to ie. duckdb and bigquery. Blobs coming from filesystem, google drive or your inbox (also incrementally) by @AstrakhantsevaAA
0.3.6
Core Library
-
fixes lost data and incorrect handling of child tables during
truncate-and-insert
replace by @sh-rp in #499
This is important improvement that fixes a few holes intruncate-and-insert
replace mode (which was there from beginning ofdlt
). Now we truncate all the tables before multithreaded append process starts. We also truncate child tables that could be left with data before.
details: #263 #271 -
fixes deploy airflow secrets and makes
toml
the default layout by @rudolfix in #513 -
check the required verified source
dlt
version duringdlt init
and warn users by @steinitzu in #514 -
add schema version to _dlt_loads table by @codingcyclist in #466
Docs
- Add example values to data types docs by @burnash in #516
- adding destination walkthrough by @rudolfix in #520
New Contributors
- @codingcyclist made their first contribution in #466
Full Changelog: 0.3.5...0.3.6
0.3.5
Core Library
-
Fix incremental hitting end_value throwing out whole batches by @steinitzu in #495
-
replace with staging tables by @sh-rp in #488
Now staging dataset may be used to replace tables. you can chose from several replace strategies (https://dlthub.com/docs/general-usage/full-loading) including fully transactional and atomic replacing of parent and all child tables or optimized where we use ie. ability to clone tables and copy on write in BigQuery and Snowflake -
detect serverless aws_lambda by @muppinesh in #490
Docs
- staging docs update by @rudolfix in #496
- Updates to verified sources by @dat-a-man
New Contributors
- @muppinesh made their first contribution in #490
Full Changelog: 0.3.4...0.3.5
0.3.4
Core Library
- staging for loader files implemented by @sh-rp in #451
- staging for redshift on s3 bucket and json + parquet by @sh-rp in #451
- staging for bigquery on gs bucket and json + parquet by @sh-rp in #451
- staging for snowflake on s3+gs buckets and json + parquet by @sh-rp in #451
- improvements and bugfixes for parquet generation by @rudolfix in #451
- tracks helpers usage and source names by @rudolfix in #497
- Fix: use sets to prevent unnecessary truncate calls by @z3z1ma in #481
Docs
- staging docs update by @sh-rp in #485
- rewritten documentation for destinations @rudolfix @AstrakhantsevaAA @dat-a-man
- adds category pages for sources and destinations by @rudolfix in #486
- Clarifies create-a-pipeline docs by @willi-mueller in #493
New Contributors
- @willi-mueller made their first contribution in #493
Full Changelog: 0.3.3...0.3.4
0.3.3
Core Library
- supports motherduck as a destination by @rudolfix in #460
- dbt 1.5 compatibility, enabled motherduck dbt support by @sh-rp in #475
- add more retry conditions and makes timeouts configurable in dlt requests drop-in replacement by @steinitzu in #477
- end_value support to incremental: backloading in parallel chunks now possible by @steinitzu in #467
Docs
- deploy cloud function as webhook by @dat-a-man in #449
- several key sections were updated and refactored by @AstrakhantsevaAA
- destination documentation refactor by @rudolfix in #478
Full Changelog: 0.3.2...0.3.3
0.3.3a0
0.3.2
Core Library
- snowflake destination: we support loading via PUT stage (
parquet
andjsonl
) and password and key pair authentication by @steinitzu in #414 - parquet files in load packages are supported with pyarrow. following destinations accept those when loading: bigquery, duckdb, snowflake and filesystem, by @sh-rp in #403
dbt-snowflake
supported by dbt wrapper by @steinitzu in #448
Docs
- Docs: polished reference's docs by @AstrakhantsevaAA in #430
dhelp
(AI assistant in docs) enabled by @burnash in #390- Added deploy with google cloud functions by @dat-a-man in #426
- train-gpt-q&a-blog by @TongHere in #438
- adding the open api spec article by @rahuljo in #442
- Docs/user guide data scientists by @AstrakhantsevaAA in #436
- Docs: airflow intro by @AstrakhantsevaAA in #444
- documents snowflake destination by @rudolfix in #447
- add file formats and fill out the parquet page in docs by @sh-rp in #439
- Added filesystem destination docs by @dat-a-man in #440
Full Changelog: 0.3.1...0.3.2