Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: cleanup and refactor requirements #66

Open
lawrenceadams opened this issue Sep 29, 2024 · 3 comments · May be fixed by #67
Open

chore: cleanup and refactor requirements #66

lawrenceadams opened this issue Sep 29, 2024 · 3 comments · May be fixed by #67

Comments

@lawrenceadams
Copy link
Collaborator

At present requirements.txt has 63 dependencies of unknown origin (i.e. some are from dbt-core / dbt-postgres / dbt-duckdb / pre-commit / black etc...); likely from repeated pip freeze > requirements.txt runs.

Although this is not too many in the grand scheme of things, I think this will cause increasing issues for multiple reasons:

  • the number of dependancies will spiral as we use more tools and support more databases (more database drivers/adapters, linters, etc); with no idea of what dependency is associated with what
    • how often will we realistically want to have multiple database environments in one sitting
  • if we want to support things like codespaces or CI/CD we may want a lighter-weight set of requirements instead of installing all drivers for all databases when we'll likely only use one (duckdb) - and a fully deterministic install to guarantee reproducibility

To resolve this I propose we split out our requirements into multiple requirement.txt files.

For example:

dbt-synthea/
└── requirements/
    ├── dev-tools.in
    ├── duckdb.in
    ├── postgres.in
    ├── snowflake.in
    ├── duckdb.txt
    ├── postgres.txt
    └── snowflake.txt

The downsides of this is more complexity, but greater ease of maintainability, and it becomes obvious what is responsible for each dependency.

There are multiple tools available for this; for example: pip-tools or (the more modern/faster) uv

@adambouras
Copy link
Collaborator

Hi @lawrenceadams ,

How about poetry? Could you share some pros and cons of using poetry?

Thank you,
Adam

@lawrenceadams
Copy link
Collaborator Author

Hey @adambouras - I considered this, however in my view:

Pros:

  • lock file generation (essentially what generating xyz.txt files is doing - other tools also have this feature)

Cons:

  • poetry is focused around managing python projects
    • Do we want to setup a pyproject.toml file? This is not a python project that we're going to release on PyPi - as far as I can understand the user experience will be cloning the project and then running pip install xyz -- not running pip install dbt-synthea. This is a project that utilizes a lot of python packages, but is not a python project itself.
  • Poetry is quite complicated - especially for what we want

I'm a big fan of poetry and using proper project setups, but I don't think it really fits in this instance. Either way - I think this needs to be cleaned up at some point! Not that fussy on one approach/tool over the other, but I think we can simplify how we approach this.

Arguably, we don't need these files at all - we only really need dbt-duckdb or dbt-postgres to be installed with some linting/precommit/etc tools, but this makes it more approachable for a beginner and allows CI/Devcontainers/etc to be setup

Do you have any strong views? Happy to discuss!

@lawrenceadams lawrenceadams linked a pull request Sep 29, 2024 that will close this issue
@adambouras
Copy link
Collaborator

adambouras commented Sep 29, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants