-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Devcontainer support including Duckdb bootstrapped with minimal synthetic data #57
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,44 @@ | ||||||||||||||||||||||||||||||||||||||||
// For format details, see https://aka.ms/devcontainer.json. For config options, see the | ||||||||||||||||||||||||||||||||||||||||
// README at: https://github.com/devcontainers/templates/tree/main/src/python | ||||||||||||||||||||||||||||||||||||||||
{ | ||||||||||||||||||||||||||||||||||||||||
"name": "Python 3", | ||||||||||||||||||||||||||||||||||||||||
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile | ||||||||||||||||||||||||||||||||||||||||
"image": "mcr.microsoft.com/devcontainers/python:3.12", | ||||||||||||||||||||||||||||||||||||||||
// Features to add to the dev container. More info: https://containers.dev/features. | ||||||||||||||||||||||||||||||||||||||||
// "features": {}, | ||||||||||||||||||||||||||||||||||||||||
// Use 'forwardPorts' to make a list of ports inside the container available locally. | ||||||||||||||||||||||||||||||||||||||||
// "forwardPorts": [], | ||||||||||||||||||||||||||||||||||||||||
// Use 'postCreateCommand' to run commands after the container is created. | ||||||||||||||||||||||||||||||||||||||||
"postCreateCommand": "bash ./.devcontainer/scripts/postCreate.sh", | ||||||||||||||||||||||||||||||||||||||||
"postStartCommand": "bash ./.devcontainer/scripts/postStart.sh", | ||||||||||||||||||||||||||||||||||||||||
// "mounts": [], | ||||||||||||||||||||||||||||||||||||||||
"customizations": { | ||||||||||||||||||||||||||||||||||||||||
"vscode": { | ||||||||||||||||||||||||||||||||||||||||
"extensions": [ | ||||||||||||||||||||||||||||||||||||||||
"davidanson.vscode-markdownlint", | ||||||||||||||||||||||||||||||||||||||||
"editorconfig.editorconfig", | ||||||||||||||||||||||||||||||||||||||||
"mads-hartmann.bash-ide-vscode", | ||||||||||||||||||||||||||||||||||||||||
"mechatroner.rainbow-csv", | ||||||||||||||||||||||||||||||||||||||||
"ms-python.black-formatter", | ||||||||||||||||||||||||||||||||||||||||
"ms-python.python", | ||||||||||||||||||||||||||||||||||||||||
"ms-python.vscode-pylance", | ||||||||||||||||||||||||||||||||||||||||
"njpwerner.autodocstring", | ||||||||||||||||||||||||||||||||||||||||
"innoverio.vscode-dbt-power-user" | ||||||||||||||||||||||||||||||||||||||||
], | ||||||||||||||||||||||||||||||||||||||||
Comment on lines
+17
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of these are unused
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure you want to get rid of black? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I forgot there were 3 python files! |
||||||||||||||||||||||||||||||||||||||||
"settings": { | ||||||||||||||||||||||||||||||||||||||||
"python.formatting.provider": "black", | ||||||||||||||||||||||||||||||||||||||||
"python.analysis.completeFunctionParens": true, | ||||||||||||||||||||||||||||||||||||||||
"[python]": { | ||||||||||||||||||||||||||||||||||||||||
"editor.defaultFormatter": "ms-python.black-formatter", | ||||||||||||||||||||||||||||||||||||||||
"editor.formatOnSave": true | ||||||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||||||
"python.defaultInterpreterPath": "./dbt-env/bin/python3", | ||||||||||||||||||||||||||||||||||||||||
"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3" | ||||||||||||||||||||||||||||||||||||||||
Comment on lines
+35
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can trim down on unnecessary venv configs
Suggested change
|
||||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Valid json
Suggested change
|
||||||||||||||||||||||||||||||||||||||||
// Configure tool-specific properties. | ||||||||||||||||||||||||||||||||||||||||
// "customizations": {}, | ||||||||||||||||||||||||||||||||||||||||
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root. | ||||||||||||||||||||||||||||||||||||||||
// "remoteUser": "root" | ||||||||||||||||||||||||||||||||||||||||
} |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,5 @@ | ||||||||||||
dbt-core==1.8.2 | ||||||||||||
dbt-duckdb==1.8.1 | ||||||||||||
pre-commit==3.7.1 | ||||||||||||
black | ||||||||||||
python-dotenv | ||||||||||||
Comment on lines
+1
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we don't need a requirements.txt file for this (if we only need to install
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am unsure of the purpose of the root There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A devcontainer is for a consistent development experience - and I can see that the Typically, one wouldn't need a separate requirements file within the devcontainer and the project Development dependencies are going to be different from dependencies required for just running the code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for your explanation. Ah that makes more sense, I trimmed out some of it as it wasn't referenced anywhere else and didn't see why it would be needed ~ it's probably fine to leave it in for now and can have another PR to tighten up precommit actions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can/should this be updated now that #67 is merged in? |
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,40 @@ | ||||||||||||||||
# Set up git | ||||||||||||||||
git config --global --add safe.directory /workspaces/dbt-synthea | ||||||||||||||||
git config --global init.defaultBranch main | ||||||||||||||||
|
||||||||||||||||
Comment on lines
+1
to
+4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to be corrected, but I think some of this is unnecessary - but maybe is required to defend against WSL2 instances (as I know git can get funny with Windows' ACLs)? Can leave in but I think not needed! Regardless, the default branch doesn't need to be changed as the project already exists
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree regarding default branch. But, the add safe.directory directive is required on when running this on WSL2. |
||||||||||||||||
# Install requirements | ||||||||||||||||
cd /workspaces/dbt-synthea | ||||||||||||||||
python -m venv dbt-env | ||||||||||||||||
source dbt-env/bin/activate | ||||||||||||||||
pip install -r .devcontainer/scripts/minimal_requirements.txt | ||||||||||||||||
lawrenceadams marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
# Setup pre-commit | ||||||||||||||||
pre-commit install | ||||||||||||||||
|
||||||||||||||||
# Setup bash history search | ||||||||||||||||
cat >> ~/.inputrc <<'EOF' | ||||||||||||||||
"\e[A": history-search-backward | ||||||||||||||||
"\e[B": history-search-forward | ||||||||||||||||
EOF | ||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
# Setup dbt profile | ||||||||||||||||
mkdir /home/vscode/.dbt | ||||||||||||||||
cat >> /home/vscode/.dbt/profiles.yml <<'EOF' | ||||||||||||||||
synthea_omop_etl: | ||||||||||||||||
target: dev | ||||||||||||||||
outputs: | ||||||||||||||||
dev: | ||||||||||||||||
type: duckdb | ||||||||||||||||
path: ./data/synthea_omop_etl.duckdb | ||||||||||||||||
schema: dbt_synthea_dev | ||||||||||||||||
EOF | ||||||||||||||||
|
||||||||||||||||
# Setup dbt | ||||||||||||||||
|
||||||||||||||||
echo "Setting up duckdb synthetic data and dbt" | ||||||||||||||||
mkdir ./data | ||||||||||||||||
|
||||||||||||||||
dbt deps | ||||||||||||||||
dbt seed |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,2 @@ | ||||
git config --global --add safe.directory /workspaces/dbt-synthea | ||||
|
||||
Comment on lines
+1
to
+2
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this needs to be run on every postStart (if it has already been done onCreate), if at all
Suggested change
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -5,6 +5,7 @@ | |||||||
}, | ||||||||
"python.terminal.activateEnvInCurrentTerminal": true, | ||||||||
"python.defaultInterpreterPath": "./dbt-env/bin/python3", | ||||||||
"dbt.dbtPythonPathOverride": "./dbt-env/bin/python3", | ||||||||
Comment on lines
6
to
+8
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hah 😄 I should have scrolled here before responding to the previous comment. Agree with this change - the user should be responsible for managing the environment rather than making this part of the codebase. But, this needs to be discussed with the others as it will change the behaviour of vscode for users. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, i had set it up this way so that folks using VS Code would get their virtualenv activated each time they open the repo, without having to remember to activate it. if there's a way to instruct people to set this up without committing it to the repo, i'm open to doing that instead - is there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Setting up and activating an environment should be the user's responsibility and outside of source control. What if someone wants to use I would propose moving this to the documentation in README and modifying my devcontainer hack to get rid of setting up a virtual environment within a devcontainer - which as @lawrenceadams rightly pointed out, is redundant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case should we add .vscode to .gitignore? (and then maybe we could add this settings json file to an extras folder so users can easily copy it to their local if they want to use it without thinking too much about it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My preference is to keep .vscode under source control to share settings (eg. spaces Vs tabs, task configuration, etc.) - there is a good discussion here - https://stackoverflow.com/questions/32964920/should-i-commit-the-vscode-folder-to-source-control However, environment activation steps can simply go in README rather than in code. I would expect the user to be able to setup a virtual environment if they are planning to use DuckDb, Postgres and dbt. If not, good time to learn anyway. I will update this PR (with rebase). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me! Please go ahead and include that change in this PR :) |
||||||||
"dbt.queryLimit": 500, | ||||||||
"yaml.schemas": { | ||||||||
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/dbt_yml_files.json": [ | ||||||||
|
@@ -25,4 +26,4 @@ | |||||||
"packages.yml" | ||||||||
] | ||||||||
}, | ||||||||
} | ||||||||
} |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Forgot to bump this! Nice catch |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
packages: | ||
- package: dbt-labs/dbt_utils | ||
version: 1.2.0 | ||
sha1_hash: d4f259856543b0ef301e0b3b0bbc94ccb6b12a54 | ||
version: 1.3.0 | ||
sha1_hash: 226ae69cdfbc9367e2aa2c472b01f99dbce11de0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This postStart script is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly - and this may be a relic from a fix to get devcontainers working for several projects on my machine (Windows with Docker Desktop using WSL2). If someone can confirm that git does not throw permission errors without that line, please feel free to remove it. It is borrowed from another devcontainer that does a bunch of stuff after starting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The postStart.sh is needed in order to install the duckdb pieces.
I'm working off that shell script so that these happen after the Setup dbt section
I haven't added fancy logic to detect which OS.
My assumption is that for Github Codespaces that it is good enough for demoing for the symposium.
A longer term solution will be needed.
`
Setup dbt
echo "Setting up duckdb synthetic data directory"
mkdir ./data
echo "Installing Duckdb in Github Codespaces which is Ubuntu based OS"
wget https://github.com/duckdb/duckdb/releases/download/v1.0.0/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
echo "Initialize the duckdb file and exit duckdb"
/workspaces/dbt-synthea/duckdb ./data/synthea_omop_etl.duckdb -s .quit
echo "Debugging dbt to check the connection to duckdb"
dbt debug
echo "Configure dbt dependent Python packages"
dbt deps
echo "Seed dbt duckdb with data"
dbt seed
echo "Compile and Build the dbt project"
dbt compile
dbt build
`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update - My comment above is obviated by Lawrences awesome work to reconcile the python modules so that dbt debug checks both the connector to duckdb and initializes the file specified in the project yaml file.