This is a sample project for Databricks, generated via cookiecutter.
While using this project, you need Python 3.X and pip
or conda
for package management.
pip install -r unit-requirements.txt
pip install -e .
For local unit testing, please use pytest
:
pytest tests/unit --cov
For an integration test on interactive cluster, use the following command:
dbx execute --cluster-name=<name of interactive cluster> --job=dbx-sample-sample-integration-test
For a test on an automated job cluster, deploy the job files and then launch:
dbx deploy --jobs=dbx-sample-sample-integration-test --files-only
dbx launch --job=dbx-sample-sample-integration-test --as-run-submit --trace
dbx
expects that cluster for interactive execution supports%pip
and%conda
magic commands.- Please configure your job in
conf/deployment.yml
file. - To execute the code interactively, provide either
--cluster-id
or--cluster-name
.
dbx execute \
--cluster-name="<some-cluster-name>" \
--job=job-name
Multiple users also can use the same cluster for development. Libraries will be isolated per each execution context.
Next step would be to configure your deployment objects. To make this process easy and flexible, we're using YAML for configuration.
By default, deployment configuration is stored in conf/deployment.yml
.
To deploy only the files and not to override the job definitions, do the following:
dbx deploy --files-only
To launch the file-based deployment:
dbx launch --as-run-submit --trace
This type of deployment is handy for working in different branches, not to affect the main job definition.
To deploy files and update the job definitions:
dbx deploy
To launch the file-based deployment:
dbx launch --job=<job-name>
This type of deployment shall be mainly used from the CI pipeline in automated way during new release.
Please set the following secrets or environment variables for your CI provider:
DATABRICKS_HOST
DATABRICKS_TOKEN
- To trigger the CI pipeline, simply push your code to the repository. If CI provider is correctly set, it shall trigger the general testing pipeline
- To trigger the release pipeline, get the current version from the
dbx_sample/__init__.py
file and tag the current code version:
git tag -a v<your-project-version> -m "Release tag for version <your-project-version>"
git push origin --tags