-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability to create sql resources from queries #1997
Comments
@erik-meko
def load_sql_data() -> None:
"""Load data from a sql database with sqlalchemy, for more complex examples use our sql_database source"""
# Use any SQL database supported by SQLAlchemy, below we use a public
# MySQL instance to get data.
# NOTE: you'll need to install pymysql with `pip install pymysql`
# NOTE: loading data from public mysql instance may take several seconds
engine = sa.create_engine("mysql+pymysql://[email protected]:4497/Rfam")
with engine.connect() as conn:
# Select genome table, stream data in batches of 100 elements
query = "SELECT * FROM genome LIMIT 1000"
rows = conn.execution_options(yield_per=100).exec_driver_sql(query)
pipeline = dlt.pipeline(
pipeline_name="from_database",
destination="duckdb",
dataset_name="genome_data",
)
# Convert the rows into dictionaries on the fly with a map function
load_info = pipeline.run(map(lambda row: dict(row._mapping), rows), table_name="genome")
print(load_info) # noqa: T201 tell me if any of those solves your problem. we'll keep this ticket in the pipeline anyway because things above are less convenient that sql query. |
I am not allowed to modify the source and unfortunately I wasnt successful in my attempts to apply a query_adapter_callback (I failed to understand how to modify the schema) Alternative 3 looks simple enough, however, I dropped the PoC after trying to implement another pipeline in self-hosted airflow with azure keyvault as secrets backend. During the run, an api pipeline is apparently looking for a lot of airflow variables. With each check came a failing remote call to the azure keyvault, I got ~25 lines of stacktrace in the log for each, totalling in many 100 lines of stacktrace in the airflow log. Troubleshooting it would have been a nightmare and since this part wasnt well documented, I shut down the PoC. |
sorry to hear that |
actually free text sql support is coming here #2070 |
fixed in 1.4.1 |
Feature description
Create a sql resource in
dlt.sources.sql_database
that allows creating models/tables with sql. Could be named sql_query and work like sql_table, except that instead of creating a resource given a single table, it can use multiple database objects as input, preferably defined using a sql statement.This is useful when the source database is not in the consumers control, so that it is not possible to create a static view
Are you a dlt user?
I'd consider using dlt, but it's lacking a feature I need.
Use case
I have a need to load incremental tables where the cursor is in a second table, i.e. I would like to do
Proposed solution
Add a new "resource type" in sql_database, that allows to use more ephemeral entities like select statements (or temp tables)
Related issues
No response
The text was updated successfully, but these errors were encountered: