Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devcontainer support including Duckdb bootstrapped with minimal synthetic data #57

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

vvcb
Copy link
Collaborator

@vvcb vvcb commented Sep 20, 2024

As discussed at the dbt sync community call on 2024-09-20, this PR adds devcontainer support to the project.

Key features:

  • Python 3.12 with minimal requirements pre-installed for dbt to work with DuckDb.
  • Key VS Code extensions to support development including dbt-power-user extension (user may need to set preferred Python environment to ./dbt-env/bin/python3 to allow the extension to work - this requires a longer-term fix where the extension recognises this automatically.
  • dbt dependencies and pre-commit are installed at container creation time
  • DuckDb database created in ./data/ with minimal synthetic data loaded from seed files.

The user will have to run dbt run to create the rest of the models.

@vvcb vvcb added the enhancement New feature or request label Sep 20, 2024
Copy link
Collaborator

@lawrenceadams lawrenceadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!! Feel free to ignore suggestions - but I think we can simplify some of the settings/extensions used here. Happy to debate

// "forwardPorts": [],
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "bash ./.devcontainer/scripts/postCreate.sh",
"postStartCommand": "bash ./.devcontainer/scripts/postStart.sh",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This postStart script is not needed

Suggested change
"postStartCommand": "bash ./.devcontainer/scripts/postStart.sh",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly - and this may be a relic from a fix to get devcontainers working for several projects on my machine (Windows with Docker Desktop using WSL2). If someone can confirm that git does not throw permission errors without that line, please feel free to remove it. It is borrowed from another devcontainer that does a bunch of stuff after starting.

"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3"
}
}
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid json

Suggested change
},
}

Comment on lines +1 to +5
dbt-core==1.8.2
dbt-duckdb==1.8.1
pre-commit==3.7.1
black
python-dotenv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need a requirements.txt file for this (if we only need to install dbt-duckdb)

Suggested change
dbt-core==1.8.2
dbt-duckdb==1.8.1
pre-commit==3.7.1
black
python-dotenv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure of the purpose of the root requirements.txt too? but I think thats for a different PR!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A devcontainer is for a consistent development experience - and black and pre-commit are the bare minimum and ensure that code contributions meet a minimum standard. We should have sqlfluff included explicitly in there too - I think it gets installed anyway as part of dbt.

I can see that the pre-commit -config.yaml file is very minimal - and can be improved as well. Possibly a subset of what is here -> https://github.com/vvcb/dbt-synthea/blob/vc/databricks/.pre-commit-config.yaml

Typically, one wouldn't need a separate requirements file within the devcontainer and the project requirements.txt should be sufficient. The new PR could address this with a single requirements.txt file for the project with database-specific dependencies and dependencies required only for development (such as black, pre-commit, etc.) specified as optional dependencies. This should allow the user to install the project with pip install dbt_synthea[postgres,dev] for instance.

Development dependencies are going to be different from dependencies required for just running the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation.

Ah that makes more sense, I trimmed out some of it as it wasn't referenced anywhere else and didn't see why it would be needed ~ it's probably fine to leave it in for now and can have another PR to tighten up precommit actions

Comment on lines +1 to +4
# Set up git
git config --global --add safe.directory /workspaces/dbt-synthea
git config --global init.defaultBranch main

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to be corrected, but I think some of this is unnecessary - but maybe is required to defend against WSL2 instances (as I know git can get funny with Windows' ACLs)? Can leave in but I think not needed!

Regardless, the default branch doesn't need to be changed as the project already exists

Suggested change
# Set up git
git config --global --add safe.directory /workspaces/dbt-synthea
git config --global init.defaultBranch main
# Set up git
git config --global --add safe.directory /workspaces/dbt-synthea
git config --global init.defaultBranch main

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree regarding default branch. But, the add safe.directory directive is required on when running this on WSL2.

.devcontainer/scripts/postCreate.sh Show resolved Hide resolved
Comment on lines +1 to +2
git config --global --add safe.directory /workspaces/dbt-synthea

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be run on every postStart (if it has already been done onCreate), if at all

Suggested change
git config --global --add safe.directory /workspaces/dbt-synthea

Comment on lines 6 to +8
"python.terminal.activateEnvInCurrentTerminal": true,
"python.defaultInterpreterPath": "./dbt-env/bin/python3",
"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"python.terminal.activateEnvInCurrentTerminal": true,
"python.defaultInterpreterPath": "./dbt-env/bin/python3",
"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah 😄 I should have scrolled here before responding to the previous comment. Agree with this change - the user should be responsible for managing the environment rather than making this part of the codebase. But, this needs to be discussed with the others as it will change the behaviour of vscode for users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to bump this! Nice catch

Comment on lines +17 to +27
"extensions": [
"davidanson.vscode-markdownlint",
"editorconfig.editorconfig",
"mads-hartmann.bash-ide-vscode",
"mechatroner.rainbow-csv",
"ms-python.black-formatter",
"ms-python.python",
"ms-python.vscode-pylance",
"njpwerner.autodocstring",
"innoverio.vscode-dbt-power-user"
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are unused

Suggested change
"extensions": [
"davidanson.vscode-markdownlint",
"editorconfig.editorconfig",
"mads-hartmann.bash-ide-vscode",
"mechatroner.rainbow-csv",
"ms-python.black-formatter",
"ms-python.python",
"ms-python.vscode-pylance",
"njpwerner.autodocstring",
"innoverio.vscode-dbt-power-user"
],
"extensions": [
"davidanson.vscode-markdownlint",
"mechatroner.rainbow-csv",
"ms-python.python",
"ms-python.vscode-pylance",
"njpwerner.autodocstring",
"innoverio.vscode-dbt-power-user"
],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to get rid of black?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot there were 3 python files!

Comment on lines +35 to +36
"python.defaultInterpreterPath": "./dbt-env/bin/python3",
"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can trim down on unnecessary venv configs

Suggested change
"python.defaultInterpreterPath": "./dbt-env/bin/python3",
"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3"

@vvcb
Copy link
Collaborator Author

vvcb commented Sep 22, 2024

@lawrenceadams , thank you for reviewing this. Agree with most things - but this was to get the devcontainer working with the existing codebase in preparation for the OHDSI symposium software demo to allow users to spin this up in their own GitHub codespaces.

Disagree with a few. Very happy for any of this to be modified as long as it works for everyone.

@lawrenceadams
Copy link
Collaborator

lawrenceadams commented Sep 23, 2024

In all honesty is good to go for all intents and purposes, a lot of these issues are a reflection of python being annoying to manage packages with IMHO!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants