chore: cleanup and refactor requirements #66

lawrenceadams · 2024-09-29T19:55:13Z

At present requirements.txt has 63 dependencies of unknown origin (i.e. some are from dbt-core / dbt-postgres / dbt-duckdb / pre-commit / black etc...); likely from repeated pip freeze > requirements.txt runs.

Although this is not too many in the grand scheme of things, I think this will cause increasing issues for multiple reasons:

the number of dependancies will spiral as we use more tools and support more databases (more database drivers/adapters, linters, etc); with no idea of what dependency is associated with what
- how often will we realistically want to have multiple database environments in one sitting
if we want to support things like codespaces or CI/CD we may want a lighter-weight set of requirements instead of installing all drivers for all databases when we'll likely only use one (duckdb) - and a fully deterministic install to guarantee reproducibility

To resolve this I propose we split out our requirements into multiple requirement.txt files.

For example:

dbt-synthea/
└── requirements/
    ├── dev-tools.in
    ├── duckdb.in
    ├── postgres.in
    ├── snowflake.in
    ├── duckdb.txt
    ├── postgres.txt
    └── snowflake.txt

The downsides of this is more complexity, but greater ease of maintainability, and it becomes obvious what is responsible for each dependency.

There are multiple tools available for this; for example: pip-tools or (the more modern/faster) uv

The text was updated successfully, but these errors were encountered:

adambouras · 2024-09-29T20:18:41Z

Hi @lawrenceadams ,

How about poetry? Could you share some pros and cons of using poetry?

Thank you,
Adam

lawrenceadams · 2024-09-29T20:45:13Z

Hey @adambouras - I considered this, however in my view:

Pros:

lock file generation (essentially what generating xyz.txt files is doing - other tools also have this feature)

Cons:

poetry is focused around managing python projects
- Do we want to setup a pyproject.toml file? This is not a python project that we're going to release on PyPi - as far as I can understand the user experience will be cloning the project and then running pip install xyz -- not running pip install dbt-synthea. This is a project that utilizes a lot of python packages, but is not a python project itself.
Poetry is quite complicated - especially for what we want

I'm a big fan of poetry and using proper project setups, but I don't think it really fits in this instance. Either way - I think this needs to be cleaned up at some point! Not that fussy on one approach/tool over the other, but I think we can simplify how we approach this.

Arguably, we don't need these files at all - we only really need dbt-duckdb or dbt-postgres to be installed with some linting/precommit/etc tools, but this makes it more approachable for a beginner and allows CI/Devcontainers/etc to be setup

Do you have any strong views? Happy to discuss!

adambouras · 2024-09-29T21:30:19Z

Hi Lawrence, I totally agree with you, and I am looking forward to testing the new changes Adam

…

On Sun, Sep 29, 2024, 4:45 PM Lawrence Adams ***@***.***> wrote: Hey @adambouras <https://github.com/adambouras> - I considered this, however in my view: Pros: - lock file generation (essentially what generating xyz.txt files is doing - other tools also have this feature) Cons: - poetry is focused around managing python projects - Do we want to setup a pyproject.toml file? This is not a python project that we're going to release on PyPi - as far as I can understand the user experience will be cloning the project and then running pip install xyz -- not running pip install dbt-synthea. This is a project that utilizes a lot of python packages, but is not a python project itself. - Poetry is quite complicated - especially for what we want I'm a big fan of poetry and using proper project setups, but I don't think it really fits in this instance. Either way - I think this needs to be cleaned up at some point! Not that fussy on one approach/tool over the other, but I think we can simplify how we approach this. Arguably, we don't need these files at all - we only really need dbt-duckdb or dbt-postgres to be installed with some linting/precommit/etc tools, but this makes it more approachable for a beginner and allows CI/Devcontainers/etc to be setup Do you have any strong views? Happy to discuss! — Reply to this email directly, view it on GitHub <#66 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQELOSBW2BLMHRHTORMVLLZZBRHBAVCNFSM6AAAAABPB726WKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBRGU4TMNJVGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

lawrenceadams linked a pull request Sep 29, 2024 that will close this issue

chore: refactor dependencies #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cleanup and refactor requirements #66

chore: cleanup and refactor requirements #66

lawrenceadams commented Sep 29, 2024

adambouras commented Sep 29, 2024

lawrenceadams commented Sep 29, 2024

adambouras commented Sep 29, 2024 via email

chore: cleanup and refactor requirements #66

chore: cleanup and refactor requirements #66

Comments

lawrenceadams commented Sep 29, 2024

adambouras commented Sep 29, 2024

lawrenceadams commented Sep 29, 2024

adambouras commented Sep 29, 2024 via email