Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement first end-to-end client/server flow #23

Merged
merged 29 commits into from
Dec 11, 2024
Merged

Conversation

lmazz1-dbt
Copy link
Collaborator

@lmazz1-dbt lmazz1-dbt commented Dec 6, 2024

Context

Changes proposed in this pull request

  • Added httpx to the project
  • Created _handler.py which implements the interface for all client functions/methods to talk to the API
  • Created common objects for the resolution graph
  • Created a standard and more compact way of representing hashes as strings
  • Implement the end-to-end flow for requesting the resolution graph in this new client-server world
  • Tested all of the above, including adding a dependency on VCRpy
  • Dockerised API, with dev mode (hot reload), and run it for testing locally and through GitHub workflow
  • Rationalised the way environment variables are set: now they are only loaded using dotenv, and read from a committed test.env

Guidance to review

  • Settled on HTTPX in the end rather than requests, as HTTPX comes with FastAPI[standard] anyway
  • I have done quite a lot of renaming of "models" to "resolutions" and similar, but I tried to do it only when writing new code or referencing new code. In this way I'm minimising the changes of this PR to stay in scope, and also for the moment keeping the "Model" entity name as an implementation detail of the adapter. Let's continue the refactor in a different ticket (this one potentially).
  • I've created a client folder. I think (though it is debatable) that it would be nice to have three top-level folders: client, server and common. However, this is useful even in a temporary capacity to separate the client functions we have migrated to this new client/server world
    • Side note: There are a number of things in common that shouldn't be. Exceptions are an example - I think we need different exception types between client and server because as we disentangle those errors will be quite different. Let's again pick this up later.
  • Previously we added datasets as well as "models of type dataset" to the "model subgraph" returned as a rustworkx object. I thought that didn't actually fit within the resolution graph and I changed it. I also changed a bug in the previous logic constructing the "model subgraph" that would have inaccurately labelled e.g. human resolutions as "models". Finally, I changed the query to only return direct edges, not all closure table entries
  • To test _handler.py after having looked into a few options, I'm using VCR cassettes. Essentially, only the first time a test is run, is the API actually hit, and the result stored in a fixture. If the API endpoint changes, the fixture needs to be manually deleted. Once we write more complex tests of this kind, we'll need to work out what happens when the client request changes, but I believe it will also try to hit the API again in that case
    • However, what happens if tests aren't run locally, and the first time a cassette is created is through the GitHub test pipeline? I conjecture that if the test was going to pass, it's still going to pass but without adding to the repo the fixture which I guess is fine? The fixture is going to be created the next time all tests are run (which PR authors are requested to do anyway, and check the checkbox in the PR template)

Checklist:

  • My code follows the style guidelines of this project
  • New and existing unit tests pass locally with my changes

@lmazz1-dbt lmazz1-dbt marked this pull request as ready for review December 7, 2024 20:21
Copy link
Collaborator

@wpfl-dbt wpfl-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love matchbox.common.graph. I've been thinking that we should describe the graph view that all backends represent views onto somewhere, because I think it's a useful conceptual introduction, and vital for anyone writing a backend. This is a great move in the direction of codifying the abstract model.

I trust you on VCR as a good solution for testing requests. I'm not going to look any further into it unless you think I should.

Why aren't you finishing the job with the Models* table name purge?

I'm not wholly convinced by test.env. It's also my experience that having sample.env and test.env be different is useful: sample.env can showcase features, whereas test.env has a job to do (see my sample one in the indexing branch).

docker-compose.yml Show resolved Hide resolved
.github/workflows/pytest.yml Outdated Show resolved Hide resolved
test/client/test_handler.py Outdated Show resolved Hide resolved
test/client/test_handler.py Outdated Show resolved Hide resolved
src/matchbox/client/_handler.py Show resolved Hide resolved
src/matchbox/common/hash.py Show resolved Hide resolved
src/matchbox/server/Dockerfile Show resolved Hide resolved
src/matchbox/server/postgresql/adapter.py Show resolved Hide resolved
src/matchbox/server/postgresql/orm.py Show resolved Hide resolved
test.env Outdated Show resolved Hide resolved
Copy link
Collaborator

@wpfl-dbt wpfl-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy based on our discussion that the full resolution rename happens in the next PR.

@lmazz1-dbt lmazz1-dbt merged commit 2eac995 into main Dec 11, 2024
5 checks passed
@lmazz1-dbt lmazz1-dbt deleted the feature/mb51 branch December 11, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants