Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layout config is not respected in filesystem destination when using an sql_database source #2107

Open
trymzet opened this issue Nov 28, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@trymzet
Copy link
Contributor

trymzet commented Nov 28, 2024

dlt version

1.4.0

Describe the problem

When using the Filesystem destination with eg. the following layout config:

[destination.filesystem]
bucket_url = ""
layout = "{schema_name}/{table_name}/{load_id}.{file_id}.{ext}"

The data is still loaded into {schema_name}/sql_database/{table_name}/{load_id}.{file_id}.{ext} (notice that a hardcoded sql_database directory is unexpectedly inserted by dlt).

Expected behavior

I think:

  1. This name should be controllable (eg, if I have multiple SQL databases, I want to use specific db name instead of the generic "sql_database"). This might be part of the broader issue that currently, per-database configuration is not supported by dlt for sql_database sources (you either have one sql_database config or per-pipeline configs).
  2. This behavior should be documented in the layout docs.
  3. Or, the layout should be applied as specified in the config.
@sh-rp
Copy link
Collaborator

sh-rp commented Dec 11, 2024

We should have a look at this, but the destination is completely independent of the source. @trymzet are you sure that this does not happen if you use some other source?

@trymzet
Copy link
Contributor Author

trymzet commented Dec 13, 2024

@sh-rp I'm using sql_table source now, and the layout produced is <pipeline_name>_dataset/{table_name}/{load_id}.{file_id}.{ext}. I guess schema_name is <pipeline_name>_dataset by default? Then this one would align with the layout.

BTW, for sql_database, it might be possible to control this directory's name like this #2114 (comment) (to be tested)

@sh-rp
Copy link
Collaborator

sh-rp commented Dec 16, 2024

The first folder in the tree is your dataset name. You can set this with dataset_name="something" when constructing your pipeline. If you do not provide one, it will be generated from the pipeline name. You can read more about datasets and destinations in our docs to understand how this works. In your example in the first post, you have the dataset name at the beginning (you can't control this) then the "sql_database" part is the schema name that you manually inserted. If you don't want that, remove the schema_name part.

@sh-rp sh-rp self-assigned this Dec 16, 2024
@sh-rp sh-rp added the question Further information is requested label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: Done
Development

No branches or pull requests

2 participants