Skip to content

Conversation

cmgoffena13
Copy link

@cmgoffena13 cmgoffena13 commented Sep 4, 2025

I'm having trouble fleshing out the test due to the path having to point to the example project. Since I have to trigger the pytest from the root it is trying to find the /tests directory in the root rather than the example project root. Any help on resolving the path issue would be appreciated so I can finish the test. (EDIT: Figured out how to switch the contexts, let me know if there's anything that needs adjustment!)

@CLAassistant
Copy link

CLAassistant commented Sep 4, 2025

CLA assistant check
All committers have signed the CLA.

Comment on lines +144 to +150
yaml_parser = YAML(typ="safe")
for test_file in test_dir.rglob("*.yaml"):
try:
test_data = yaml_parser.load(test_file) or {}
except Exception:
# Skip files with Jinja templating or other parse errors
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this is going to be very expensive. For every model, we're rglobing the test directory and then loading every test, just to see if the model's name appears in one of them. I don't think this overhead will be acceptable in practice.

We should be smarter about this. For example, one idea is to register the models that have tests associated with them at load time, so that this done in a single pass– the existing one done to load tests. Then, implementing a rule like this becomes a simple lookup in a set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for checking this out, I agree, my knowledge of the repo is lacking here, couple questions:

  1. What do you mean by load time, what triggers that?
    2a. Is there a certain place in the codebase where storing this would be best? I can start looking around.
    2b. Best approach for storing it? Is there something similar already happening or would this be an entirely new initiative

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem!

  1. The loading stage corresponds to the code path where SQLMesh loads a project's assets (models, tests, etc). You can find the relevant logic within sqlmesh/core/loader.py– check out GenericContext.load too.
    2a. One idea is to load tests eagerly, similar to other assets, so that you know all tests that exist as early as possible and can extract the information of what models have tests at linting time.
    2b. We'd want fast lookup, so either dict or set

@VaggelisD why don't we eagerly load all tests at load time? It appears that we only do it in GenericContext.test today, lazily.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an initial scan here's my code plan:

  1. sqlmesh.core.test.discovery.py has ModelTestMetadata - implement a method to pull the model name out of the test body
  2. sqlmesh.core.loader.py has Loader.load_model_tests - doesn't look like its implemented yet, so flesh that out and then pass the ModelTestMetadata classes to the project. I can then implement another method that creates a Set of the model names by looping through once.
  3. Make sure sqlmesh.core.context.py GenericContext.load has the Set implementation as well since that's new
  4. From there I can access the create Set method from context in the linter rule to easily compare model names against.

Idea is to load the metadata classes (not sure about ModelTest, I assume ModelTestMetadata is the way to go here) into the project and then I can create a method that returns a set of the model names from the Metadata classes that I can use to compare in the linter from the context

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants