-
Notifications
You must be signed in to change notification settings - Fork 279
Adding in NoMissingUnitTest linter rule #5294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
yaml_parser = YAML(typ="safe") | ||
for test_file in test_dir.rglob("*.yaml"): | ||
try: | ||
test_data = yaml_parser.load(test_file) or {} | ||
except Exception: | ||
# Skip files with Jinja templating or other parse errors | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this is going to be very expensive. For every model, we're rglobing the test directory and then loading every test, just to see if the model's name appears in one of them. I don't think this overhead will be acceptable in practice.
We should be smarter about this. For example, one idea is to register the models that have tests associated with them at load time, so that this done in a single pass– the existing one done to load tests. Then, implementing a rule like this becomes a simple lookup in a set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for checking this out, I agree, my knowledge of the repo is lacking here, couple questions:
- What do you mean by load time, what triggers that?
2a. Is there a certain place in the codebase where storing this would be best? I can start looking around.
2b. Best approach for storing it? Is there something similar already happening or would this be an entirely new initiative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem!
- The loading stage corresponds to the code path where SQLMesh loads a project's assets (models, tests, etc). You can find the relevant logic within
sqlmesh/core/loader.py
– check outGenericContext.load
too.
2a. One idea is to load tests eagerly, similar to other assets, so that you know all tests that exist as early as possible and can extract the information of what models have tests at linting time.
2b. We'd want fast lookup, so either dict or set
@VaggelisD why don't we eagerly load all tests at load time? It appears that we only do it in GenericContext.test
today, lazily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an initial scan here's my code plan:
- sqlmesh.core.test.discovery.py has ModelTestMetadata - implement a method to pull the model name out of the test body
- sqlmesh.core.loader.py has Loader.load_model_tests - doesn't look like its implemented yet, so flesh that out and then pass the ModelTestMetadata classes to the project. I can then implement another method that creates a Set of the model names by looping through once.
- Make sure sqlmesh.core.context.py GenericContext.load has the Set implementation as well since that's new
- From there I can access the create Set method from context in the linter rule to easily compare model names against.
Idea is to load the metadata classes (not sure about ModelTest, I assume ModelTestMetadata is the way to go here) into the project and then I can create a method that returns a set of the model names from the Metadata classes that I can use to compare in the linter from the context
Thoughts?
I'm having trouble fleshing out the test due to the path having to point to the example project. Since I have to trigger the pytest from the root it is trying to find the /tests directory in the root rather than the example project root. Any help on resolving the path issue would be appreciated so I can finish the test. (EDIT: Figured out how to switch the contexts, let me know if there's anything that needs adjustment!)