This repository hosts the dbt project for Open edX Aspects. It contains sql files and configuration that control the ClickHouse reporting schema, data transformations, tests, and documentation that drive Aspects reports. This project is still in its infancy and will be growing with documentation and examples as we approach the Aspects v1 launch.
A dbt docs site with the details of this project, built off the main branch is located here: https://openedx.github.io/aspects-dbt/
pip install -r requirements.txt
will install dbt and other required packages.dbt deps
will install the dbt packages defined in packages.yml (this step needs to be run when initializing your project or when changing the package list).
dbt run
will compile and create the models defined in the "aspects" dbt project. By default, dbt will look in the xapi
schema to find source tables. The XAPI_SCHEMA
environment variable can be used to specify a different schema.
As of dbt v1.8 <https://docs.getdbt.com/reference/resource-properties/unit-tests>, models can now be tested with UNIT tests in addition to the existing DATA tests. Unit tests validate the SQL model logic by building the models using a (known to be good) dataset and comparing the results to a provided 'expected' dataset. This is especially beneficial when updating a model to ensure the output has not changed.
The unit_tests.yaml
file in each model directory contains any tests for the models in that directory.
The unit-test-seeds
directory contains all seed data csv files. There is one file for each base table (event_sink & xapi) and each 'expected' dataset.
dbt test
will only run data & generic tests (NOT unit tests). This is the default mode.
dbt test --selector unit_tests
will run all unit tests.
These require tables to be seeded first. To do this, add 'unit-test-seeds' to seed-paths:
in dbt_project.yml
and run dbt seed --full-refresh && dbt run --full-refresh
.
dbt test --selector all_tests
will run all data/generic/unit tests.
dbt does not automatically remove models that have been deleted from this project. As we remove models we will add them to a macro that can be manually run to clean up things which are no longer needed. This can be important to prevent stale materialized views from breaking when schemas change, and to prevent unnecessary inserts writes to tables that aren't used.
If you need a model that has been removed due to custom reporting you should either move that model to the system you use to manage your custom schema (such as your own dbt package) instead of letting the old version remain. This will let you explicitly upgrade it as necessary.
dbt run-operation remove_deprecated_models
will drop the relations and dbt -d run-operation remove_deprecated_models
will drop with debug information showing the commands that are run.
If you're having trouble, we have discussion forums at https://discuss.openedx.org where you can connect with others in the community.
Our real-time conversations are on Slack. You can request a Slack invitation, then join our community Slack workspace.
For anything non-trivial, the best path is to open an issue in this repository with as many details about the issue you are facing as you can provide.
https://github.com/openedx/aspects-dbt/issues
For more information about these options, see the Getting Help page.
Please see LICENSE for details.
Contributions are very welcome. Please read How To Contribute for details.
This project is currently accepting all types of contributions, bug fixes, security fixes, maintenance work, or new features. However, please make sure to have a discussion about your new feature idea with the maintainers prior to beginning development to maximize the chances of your change being accepted. You can start a conversation by creating a new issue on this repo summarizing your idea.
Make sure to format the SQL models via make format before submitting a pull request.
All community members are expected to follow the Open edX Code of Conduct.
The assigned maintainers for this component and other project details may be
found in Backstage. Backstage pulls this data from the catalog-info.yaml
file in this repo.
Please do not report security issues in public. Please email [email protected].