add fast loading methods for bigquery destination #1037

rudolfix · 2024-02-29T17:54:23Z

Background
Current method using load jobs is optimized for large loads. However most of the load packages is quite small. We can improve loading speeds by implementing streaming insert copy jobs. There's a singer target https://github.com/z3z1ma/target-bigquery by @z3z1ma from where we can take code.

we'll implement unconstrained (schema-less) version of the above when #891 in merged. this is about extending existing destination

Tasks

- allow user to select the loading API via destination configuration and per resource/table via bigquery adapter
- implement insert api and optionally storage write api
- allow both parquet and jsonl to be loaded this way. consider adding standard file readers for those types (port them from data sink destination #891 )
- tests and documentation

github-project-automation bot added this to dlt core library Feb 29, 2024

github-project-automation bot moved this to Todo in dlt core library Feb 29, 2024

rudolfix added the community This issue came from slack community workspace label Mar 1, 2024

rudolfix mentioned this issue Mar 5, 2024

data sink destination #891

Merged

rudolfix moved this from Todo to In Progress in dlt core library Mar 8, 2024

rudolfix assigned IlyaFaer Mar 8, 2024

IlyaFaer mentioned this issue Mar 21, 2024

feat(bigquery): add streaming inserts support #1123

Merged

rudolfix closed this as completed Apr 8, 2024

github-project-automation bot moved this from In Progress to Done in dlt core library Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fast loading methods for bigquery destination #1037

add fast loading methods for bigquery destination #1037

rudolfix commented Feb 29, 2024

add fast loading methods for bigquery destination #1037

add fast loading methods for bigquery destination #1037

Comments

rudolfix commented Feb 29, 2024