Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data pond: expose readable datasets as dataframes and arrow tables #1507

Merged
merged 119 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
af6a40e
add simple ibis helper
sh-rp Jun 19, 2024
3a69ece
start working on dataframe reading interface
sh-rp Jun 20, 2024
4324650
a bit more work
sh-rp Jun 20, 2024
7c960df
first simple implementation
sh-rp Jun 21, 2024
86b89ac
small change
sh-rp Jun 21, 2024
5a8ea54
more work on dataset
sh-rp Jun 21, 2024
36e94af
some work on filesystem destination
sh-rp Jun 24, 2024
20bf9ce
add support for parquet files and compression on jsonl files in files…
sh-rp Jun 26, 2024
6dce626
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Jul 17, 2024
a0ff55f
fix test after devel merge
sh-rp Jul 17, 2024
c297e96
add nice composable pipeline example
sh-rp Jul 17, 2024
d020403
small updates to demo
sh-rp Jul 18, 2024
5c3db47
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Aug 6, 2024
79ef7dd
enable tests for all bucket providers
sh-rp Aug 6, 2024
ff40079
fix tests
sh-rp Aug 6, 2024
ac415b9
create views in duckdb filesystem accessor
sh-rp Aug 6, 2024
c92a527
move to relations based interface
sh-rp Aug 6, 2024
13ec73b
add generic duckdb interface to filesystem
sh-rp Aug 6, 2024
46e0226
move code for accessing frames and tables to the cursor and use duckd…
sh-rp Aug 6, 2024
7cf69a7
add native db api cursor fetching to exposed dataset
sh-rp Aug 7, 2024
6ffe302
some small changes
sh-rp Aug 7, 2024
c200262
switch dataaccess pandas to pyarrow
sh-rp Aug 7, 2024
226454f
add native bigquery support for df and arrow tables
sh-rp Aug 7, 2024
3296e63
change iter functions to always expect chunk size (None will default …
sh-rp Aug 7, 2024
6f6500f
add native implementation for databricks
sh-rp Aug 7, 2024
152b788
add dremio native implementation for full frames and tables
sh-rp Aug 7, 2024
6d73bc5
fix filesystem test
sh-rp Aug 7, 2024
bdb39ba
add test for evolving filesystem
sh-rp Aug 7, 2024
3ead92b
fix empty dataframe retrieval
sh-rp Aug 7, 2024
9fcbd00
remove old df test
sh-rp Aug 7, 2024
28ee1c6
clean up interfaces a bit (more to come?)
sh-rp Aug 8, 2024
28cb282
move dataset creation into destination client and clean up interfaces…
sh-rp Aug 8, 2024
77926fa
renames some interfaces and adds brief docstrings
sh-rp Aug 8, 2024
6ef04bc
add filesystem cached duckdb and remove the need to declare needed vi…
sh-rp Aug 8, 2024
ec13b49
fix tests for snowflake
sh-rp Aug 8, 2024
b222d1d
make data set a function
sh-rp Aug 8, 2024
9f0a6a5
fix db-types depdency for bigquery
sh-rp Aug 8, 2024
289b63c
create duckdb based sql client for filesystem
sh-rp Aug 13, 2024
779bca6
fix example pipeline
sh-rp Aug 13, 2024
584ab47
enable filesystem sql client to work on streamlit
sh-rp Aug 13, 2024
6594053
add comments
sh-rp Aug 13, 2024
9e0a61d
rename sql to query
sh-rp Aug 13, 2024
dd47326
fix tests that rely on sql client
sh-rp Aug 13, 2024
9f8f79b
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Sep 18, 2024
fda1cb5
post merge cleanups
sh-rp Sep 18, 2024
c7a0e05
move imports around a bit
sh-rp Sep 18, 2024
8497036
exclude abfss buckets from test
sh-rp Sep 19, 2024
3dc2c90
add support for arrow schema creation from known dlt schema
sh-rp Aug 13, 2024
d6bec38
re-use sqldatabase code for cursors
sh-rp Sep 19, 2024
62ea3ba
fix bug
sh-rp Sep 19, 2024
3fd4d61
add default columns where needed
sh-rp Sep 19, 2024
eeca4ac
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Sep 20, 2024
52f8523
add sql glot to filesystem deps
sh-rp Sep 20, 2024
90c669a
store filesystem tables in correct dataset
sh-rp Sep 20, 2024
7657fb1
move cursor columns location
sh-rp Sep 20, 2024
352b238
fix snowflake and mssql
sh-rp Sep 20, 2024
5fadeeb
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Sep 20, 2024
9a1752d
clean up compose files a bit
sh-rp Sep 20, 2024
a77192f
fix sqlalchemy
sh-rp Sep 20, 2024
420eaf1
add mysql docker compose file
sh-rp Sep 20, 2024
97e2757
fix linting
sh-rp Sep 20, 2024
df4f6d0
prepare hint checking
sh-rp Sep 20, 2024
6b27b98
disable part of state test
sh-rp Sep 22, 2024
ffba901
enable hint check
sh-rp Sep 23, 2024
fab5232
add column type support for filesystem json
sh-rp Sep 23, 2024
0de4a6c
rename dataset implementation to DBAPI
sh-rp Sep 23, 2024
077a25a
wrap functions in dbapi readable dataset
sh-rp Sep 23, 2024
13a759b
remove example pipeline
sh-rp Sep 23, 2024
10e04d6
rename test_decimal_name
sh-rp Sep 23, 2024
5077ce1
make column code a bit clearer and fix mssql again
sh-rp Sep 23, 2024
1025560
rename df methods to pandas
sh-rp Sep 23, 2024
f8927d3
fix bug in default columns
sh-rp Sep 23, 2024
7fd3c62
fix hints test and columns bug
sh-rp Sep 23, 2024
3a76178
catch mysql error if no rows returned
sh-rp Sep 23, 2024
27104e3
add exceptions for not implemented bucket and filetypes
sh-rp Sep 23, 2024
1c06d11
fix docs
sh-rp Sep 23, 2024
e5b3688
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Sep 23, 2024
7d09bdb
add config section for getting pipeline clients
sh-rp Sep 26, 2024
dbe4baa
set default dataset in filesystem sqlclient
sh-rp Sep 26, 2024
f4e0099
add config section for sync_destination
sh-rp Sep 26, 2024
80fe898
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Sep 26, 2024
d698cd5
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Sep 27, 2024
857803c
rename readablerelation methods
sh-rp Sep 30, 2024
8055529
use more functions of the duckdb sql client in filesystem version
sh-rp Sep 30, 2024
24c7308
update dependencies
sh-rp Sep 30, 2024
76759cf
use active pipeline capabilities if available for arrow table
sh-rp Sep 30, 2024
d3d8381
update types
sh-rp Sep 30, 2024
f9a766d
rename dataset accessor function
sh-rp Sep 30, 2024
b6c7fbc
add test for accessing tables with unquqlified tablename
sh-rp Sep 30, 2024
86fc914
fix sql client
sh-rp Sep 30, 2024
58380ec
add duckdb native support for azure, s3 and gcs (via s3)
sh-rp Sep 30, 2024
0a24b3a
some typing
sh-rp Sep 30, 2024
bef50d7
add dataframes tests back in
sh-rp Sep 30, 2024
b13e492
add join table and update view tests for filesystem
sh-rp Sep 30, 2024
92ea515
start adding tests for creating views on remote duckdb
sh-rp Sep 30, 2024
e1fa308
fix snippets
sh-rp Sep 30, 2024
a7958d5
fix some dependencies and mssql/synapse tests
sh-rp Sep 30, 2024
ed197ea
fix bigquery dependencies and abfss tests
sh-rp Oct 1, 2024
0ec1656
add tests for adding view to external dbs and persistent secrets
sh-rp Oct 1, 2024
9cd4173
add support for delta tables
sh-rp Oct 1, 2024
7dba771
add duckdb to read interface tests
sh-rp Oct 1, 2024
3e96a6c
fix delta tests
sh-rp Oct 1, 2024
355f5b6
make default secret name derived from bucket url
sh-rp Oct 1, 2024
9002f02
try fix azure tests again
sh-rp Oct 1, 2024
c3050d4
fix df access tests
sh-rp Oct 2, 2024
bbc0525
PR fixes
sh-rp Oct 2, 2024
ef148c3
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Oct 2, 2024
a99e987
Merge branch 'devel' into exp/1095-expose-readable-datasets
sh-rp Oct 2, 2024
eaf1cd8
correct internal table access
sh-rp Oct 4, 2024
6bb7117
allow datasets without schema
sh-rp Oct 4, 2024
6648b86
skips parametrized queries, skips tables from non-dataset schemas
rudolfix Oct 6, 2024
89a9861
move filesystem specific sql_client tests to correct location and tes…
sh-rp Oct 7, 2024
631d50b
fix sql client tests
sh-rp Oct 7, 2024
8e2e37c
make secret name when dropping optional
sh-rp Oct 7, 2024
dc383fc
fix gs test
sh-rp Oct 7, 2024
41926ae
remove moved filesystem tests from test_read_interfaces
sh-rp Oct 7, 2024
9b8437a
fix sql client tests again... :)
sh-rp Oct 7, 2024
5d14045
clear duckdb secrets
sh-rp Oct 8, 2024
fb9a445
disable secrets deleting for delta tests
sh-rp Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions composable_pipeline_1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
"""Example of a composable pipeline"""

import dlt
import os
import random
from dlt.destinations import filesystem, duckdb

# fixtures
customers = [
rudolfix marked this conversation as resolved.
Show resolved Hide resolved
{"id": 1, "name": "dave"},
{"id": 2, "name": "marcin"},
{"id": 3, "name": "anton"},
{"id": 4, "name": "alena"},
]

products = [
{"name": "apple", "price": 1},
{"name": "pear", "price": 2},
{"name": "banana", "price": 3},
{"name": "schnaps", "price": 10},
]

if __name__ == "__main__":
os.environ["DATA_WRITER__DISABLE_COMPRESSION"] = "True"

#
# 1. let's load some stuff to a duckdb pipeline (standin for a remote location)
#
duck_pipeline = dlt.pipeline(
pipeline_name="warehouse", destination=duckdb(credentials="warehouse.duckdb")
)

@dlt.resource(write_disposition="replace", table_name="customers")
def c():
yield from customers

@dlt.resource(write_disposition="replace", table_name="orders")
def o():
order_no = 0
# every customer orders 4 things everyday
for weekday in ["monday", "tuesday", "wednesday"]:
for customer in customers:
for i in range(4):
order_no += 1
product = random.choice(products)
yield {
"order_day": weekday,
"id": order_no,
"customer_id": customer["id"],
"product": product["name"],
"price": product["price"],
}

# run and print result
print("RUNNING WAREHOUSE INGESTION")
print(duck_pipeline.run([c(), o()]))
print(duck_pipeline.dataset.customers.df())
print(duck_pipeline.dataset.orders.df())
print("===========================")

#
# 2. now we want a local snapshot of the customers and all orders on tuesday in a datalake
#
lake_pipeline = dlt.pipeline(
pipeline_name="local_lake", destination=filesystem(bucket_url="./local_lake")
)

print("RUNNING LOCAL SNAPSHOT EXTRACTION")
lake_pipeline.run(
duck_pipeline.dataset.customers.iter_df(),
loader_file_format="jsonl",
table_name="customers",
write_disposition="replace",
)
lake_pipeline.run(
duck_pipeline.dataset.sql(
"SELECT * FROM orders WHERE orders.order_day = 'tuesday'"
).iter_df(),
loader_file_format="jsonl",
table_name="orders",
write_disposition="replace",
)

print(lake_pipeline.dataset.customers.df())
print(lake_pipeline.dataset.orders.df())
print("===========================")

#
# 3. now we create a denormalized table locally
#

print("RUNNING DENORMALIZED TABLE EXTRACTION")
denom_pipeline = dlt.pipeline(
pipeline_name="denom_lake", destination=filesystem(bucket_url="./denom_lake")
)

denom_pipeline.run(
lake_pipeline.dataset.sql(
sql=(
"SELECT orders.*, customers.name FROM orders LEFT JOIN customers ON"
" orders.customer_id = customers.id"
),
prepare_tables=["customers", "orders"],
).iter_df(),
loader_file_format="jsonl",
table_name="customers",
write_disposition="replace",
)
print(denom_pipeline.dataset.customers.df())
56 changes: 56 additions & 0 deletions dlt/common/destination/reference.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from abc import ABC, abstractmethod
import dataclasses
from importlib import import_module
from contextlib import contextmanager

from types import TracebackType
from typing import (
Callable,
Expand All @@ -18,17 +20,23 @@
Any,
TypeVar,
Generic,
Generator,
TYPE_CHECKING,
Protocol,
Tuple,
)
from typing_extensions import Annotated
import datetime # noqa: 251
from copy import deepcopy
import inspect

from dlt.common import logger
from dlt.common.typing import DataFrame, ArrowTable
from dlt.common.configuration.specs.base_configuration import extract_inner_hint
from dlt.common.destination.utils import verify_schema_capabilities
from dlt.common.exceptions import TerminalValueError
from dlt.common.normalizers.naming import NamingConvention

from dlt.common.schema import Schema, TTableSchema, TSchemaTables
from dlt.common.schema.utils import (
get_file_format,
Expand All @@ -51,6 +59,7 @@
from dlt.common.storages.load_storage import ParsedLoadJobFileName
from dlt.common.storages.load_package import LoadJobInfo, TPipelineStateDoc


TLoaderReplaceStrategy = Literal["truncate-and-insert", "insert-from-staging", "staging-optimized"]
TDestinationConfig = TypeVar("TDestinationConfig", bound="DestinationClientConfiguration")
TDestinationClient = TypeVar("TDestinationClient", bound="JobClientBase")
Expand Down Expand Up @@ -561,6 +570,53 @@ def should_truncate_table_before_load_on_staging_destination(self, table: TTable
return True


class SupportsDataAccess(Protocol):
"""Add support accessing data items"""

def df(self, chunk_size: int = None, **kwargs: None) -> Optional[DataFrame]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have iter we do not need to chunk right? maybe limit would make sense? or max_rows? we are changing the semantics of the old method which could be also used to iterate but I'm OK with it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a shorthand for getting a dataframe in one call, if you add a chunk_size only a chunk of that size will be collected.

"""Fetches the results as data frame. For large queries the results may be chunked

Fetches the results into a data frame. The default implementation uses helpers in `pandas.io.sql` to generate Pandas data frame.
This function will try to use native data frame generation for particular destination. For `BigQuery`: `QueryJob.to_dataframe` is used.
For `duckdb`: `DuckDBPyConnection.df'

Args:
chunk_size (int, optional): Will chunk the results into several data frames. Defaults to None
**kwargs (Any): Additional parameters which will be passed to native data frame generation function.

Returns:
Optional[DataFrame]: A data frame with query results. If chunk_size > 0, None will be returned if there is no more data in results
"""
...

def arrow(self, *, chunk_size: int = None) -> Optional[ArrowTable]: ...

def iter_df(self, chunk_size: int) -> Generator[DataFrame, None, None]: ...

def iter_arrow(self, chunk_size: int) -> Generator[ArrowTable, None, None]: ...

def fetchall(self) -> List[Tuple[Any, ...]]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm thinking about this interface all the time as well :) Maybe lets just expose cursor() here? which is a proper db api cursor (which we have already implemented)
also: our implementation proxies all unknown methods to the underlying native cursor, did you notice?

so all methods below would be available via cursor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would keep it this way. The user may use the cursor too if they like, but this possiblity to iterate without having to open your own context manager is quite nice from the user perspective I think.


def fetchmany(self, chunk_size: int) -> List[Tuple[Any, ...]]: ...

def iter_fetchmany(self, chunk_size: int) -> Generator[List[Tuple[Any, ...]], Any, Any]: ...

def fetchone(self) -> Optional[Tuple[Any, ...]]: ...


class SupportsRelationshipAccess(ABC):
"""Add support for accessing a cursor for a given relationship or query"""

@abstractmethod
def cursor_for_relation(
self,
*,
table: str = None,
sql: str = None,
prepare_tables: List[str] = None,
) -> ContextManager[SupportsDataAccess]: ...


# TODO: type Destination properly
TDestinationReferenceArg = Union[
str, "Destination[Any, Any]", Callable[..., "Destination[Any, Any]"], None
Expand Down
16 changes: 16 additions & 0 deletions dlt/common/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,22 @@
REPattern = _REPattern
PathLike = os.PathLike


try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to keep such stuff out of the common:

  1. or you abstract those interfaces into protocols
  2. or import from helpers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it around a bit. We have the dataaccess stuff in the destination, and imho it would be nice to have the correct typings there.

from pandas import DataFrame
except ImportError:
DataFrame: Type[Any] = None # type: ignore

try:
from pyarrow import Table as ArrowTable
except ImportError:
ArrowTable: Type[Any] = None # type: ignore

try:
from duckdb import DuckDBPyConnection
except ImportError:
DuckDBPyConnection: Type[Any] = None # type: ignore

AnyType: TypeAlias = Any
NoneType = type(None)
DictStrAny: TypeAlias = Dict[str, Any]
Expand Down
128 changes: 128 additions & 0 deletions dlt/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
from typing import cast, Any, TYPE_CHECKING, Generator, List, Tuple, Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my take: move it inti dlt/destinations that will remove a lot of the ugly inner imports

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is moved


from contextlib import contextmanager

from dlt.common.destination.reference import SupportsRelationshipAccess, SupportsDataAccess

from dlt.common.typing import DataFrame, ArrowTable


class Relation:
def __init__(
self, *, pipeline: Any, table: str = None, sql: str = None, prepare_tables: List[str] = None
) -> None:
"""Create a lazy evaluated relation to for the dataset of a pipeline"""
from dlt.pipeline import Pipeline
sh-rp marked this conversation as resolved.
Show resolved Hide resolved

self.pipeline: Pipeline = cast(Pipeline, pipeline)
self.prepare_tables = prepare_tables
self.sql = sql
self.table = table

@contextmanager
def _client(self) -> Generator[SupportsRelationshipAccess, Any, Any]:
from dlt.destinations.job_client_impl import SqlJobClientBase
sh-rp marked this conversation as resolved.
Show resolved Hide resolved
from dlt.destinations.fs_client import FSClientBase

client = self.pipeline.destination_client()

if isinstance(client, SqlJobClientBase):
with client.sql_client as sql_client:
yield sql_client
return

if isinstance(client, FSClientBase):
yield client
return

raise Exception(
f"Destination {client.config.destination_type} does not support data access via"
" dataset."
)

@contextmanager
def cursor(self) -> Generator[SupportsDataAccess, Any, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK already implemented

"""Gets a DBApiCursor for the current relation"""
with self._client() as client:
with client.cursor_for_relation(
sql=self.sql, table=self.table, prepare_tables=self.prepare_tables
) as cursor:
yield cursor

def df(
self,
*,
chunk_size: int = None,
) -> DataFrame:
"""Get first batch of table as dataframe"""
with self.cursor() as cursor:
return cursor.df(chunk_size=chunk_size)

def arrow(
self,
*,
chunk_size: int = None,
) -> ArrowTable:
"""Get first batch of table as arrow table"""
with self.cursor() as cursor:
return cursor.arrow(chunk_size=chunk_size)

def iter_df(
self,
*,
chunk_size: int,
) -> Generator[DataFrame, None, None]:
"""iterates over the whole table in dataframes of the given chunk_size, chunk_size of -1 will return the full table in the first batch"""
with self.cursor() as cursor:
yield from cursor.iter_df(
chunk_size=chunk_size,
)

def iter_arrow(
self,
*,
chunk_size: int,
) -> Generator[ArrowTable, None, None]:
"""iterates over the whole table in arrow tables of the given chunk_size, chunk_size of -1 will return the full table in the first batch"""
with self.cursor() as cursor:
yield from cursor.iter_arrow(
chunk_size=chunk_size,
)

def fetchall(self) -> List[Tuple[Any, ...]]:
with self.cursor() as cursor:
return cursor.fetchall()

def fetchmany(self, chunk_size: int) -> List[Tuple[Any, ...]]:
with self.cursor() as cursor:
return cursor.fetchmany(chunk_size)

def iter_fetchmany(self, chunk_size: int) -> Generator[List[Tuple[Any, ...]], Any, Any]:
with self.cursor() as cursor:
yield from cursor.iter_fetchmany(
chunk_size=chunk_size,
)

def fetchone(self) -> Optional[Tuple[Any, ...]]:
with self.cursor() as cursor:
return cursor.fetchone()


class Dataset:
"""Access to dataframes and arrowtables in the destination dataset"""

def __init__(self, pipeline: Any) -> None:
from dlt.pipeline import Pipeline

self.pipeline: Pipeline = cast(Pipeline, pipeline)

def sql(self, sql: str, prepare_tables: List[str] = None) -> Relation:
return Relation(pipeline=self.pipeline, sql=sql, prepare_tables=prepare_tables)

def __getitem__(self, table: str) -> Relation:
"""access of table via dict notation"""
return Relation(pipeline=self.pipeline, table=table)

def __getattr__(self, table: str) -> Relation:
"""access of table via property notation"""
return Relation(pipeline=self.pipeline, table=table)
21 changes: 19 additions & 2 deletions dlt/destinations/fs_client.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
from typing import Iterable, cast, Any, List, Literal

import gzip
from typing import Iterable, cast, Any, List
from abc import ABC, abstractmethod
from fsspec import AbstractFileSystem

from dlt.common.typing import DuckDBPyConnection
from dlt.common.destination.reference import SupportsRelationshipAccess


class FSClientBase(ABC):
class FSClientBase(SupportsRelationshipAccess, ABC):
fs_client: AbstractFileSystem

@property
Expand Down Expand Up @@ -55,3 +59,16 @@ def read_text(
path, mode="rt", compression=compression, encoding=encoding, newline=newline
) as f:
return cast(str, f.read())

@abstractmethod
def get_duckdb(
self,
tables: List[str],
db: DuckDBPyConnection = None,
table_type: Literal["view", "table"] = "view",
) -> DuckDBPyConnection:
"""
Returns a duckdb in memory instance with given tables loaded as views or tables.
Can also take an existing duckdb object to add tables from the filesystem.
"""
pass
Loading
Loading