Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests for Similarity Search API. #450

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

kol3x
Copy link
Contributor

@kol3x kol3x commented Jun 4, 2024

Closes #430

Hello! I am glad to take part in your challenge program. I started with the easiest one, but I plan to try harder ones later.

My process was:

  • Read the CloudFlare documentation
  • Looked throw the Similarity Search code to understand how it works
  • Started building tests

I ended up making two new tests out of an authentication test that was already there, to cover all cases. Plus I added tests for validation and functionality.

One more thing to consider is that we are currently using vitest 1.2, but the docs mention that they only support 1.5.
image
I haven't upgraded the package, as I wanted to consult first.

@kol3x
Copy link
Contributor Author

kol3x commented Jun 6, 2024

@Lavriz
Let me know your thoughts on this

@Lavriz
Copy link

Lavriz commented Jun 10, 2024

@Lavriz Let me know your thoughts on this

hey! sorry for the delay, you're absolutely right on upgrading the version! thanks for noticing it

@kol3x
Copy link
Contributor Author

kol3x commented Jun 10, 2024

@Lavriz Let me know your thoughts on this

hey! sorry for the delay, you're absolutely right on upgrading the version! thanks for noticing it

I included the vitest upgrade, let me know if there's anything else needed

@kol3x kol3x marked this pull request as draft June 25, 2024 16:50
@kol3x
Copy link
Contributor Author

kol3x commented Jun 25, 2024

Putting this on draft for now, as Evgeny pointed out that additional tests for vectorization and vector DB querying should be added. I've researched Cloudflare docs on how to properly test auxiliary workers, but no success yet.

…or vectorization and db querying, refactered existing tests
@kol3x kol3x marked this pull request as ready for review June 26, 2024 11:23
@kol3x
Copy link
Contributor Author

kol3x commented Jun 26, 2024

  • Added tests for vectorization and DB querying
  • Added a vectorized sample for response to be tested against, generated by the same model that's used in the service
  • Removed test, that was supposed to cover case where there is no API key set in environment. I am not sure if this is correct, but I couldn't figure out a way to unset environment variable for this particular test, so it always failed

@kol3x kol3x marked this pull request as draft June 27, 2024 09:14
@kol3x
Copy link
Contributor Author

kol3x commented Jun 27, 2024

Figured out how workers mocks work, now tests actually pass...
image

@kol3x kol3x marked this pull request as ready for review June 27, 2024 09:32
tools/similarity_search/test/index.spec.ts Outdated Show resolved Hide resolved
tools/similarity_search/test/index.spec.ts Outdated Show resolved Hide resolved
Comment on lines 63 to 98
describe("Functionality", () => {
it("runs AI model and gets vectorized string back", async () => {
const response = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: ["Sample text"],
})
expect(response).toHaveProperty("data")
})

it("queries vector database and gets proper response", async () => {
const response = await env.VECTORIZE_INDEX.query([1, 2, 3], {
namespace: "test-namespace",
topK: 1,
})
expect(response).toHaveProperty("matches")
expect(response.matches.length).toBeGreaterThan(0)
})

it("returns similarity score for valid requests", async () => {
const response = await SELF.fetch("https://example.com/", {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "test-api-key",
},
body: JSON.stringify({
text: "Sample text",
namespace: "test-namespace",
}),
})

expect(response.status).toBe(200)
const jsonResponse: SimilarityCheckResponse = await response.json()
expect(jsonResponse).toHaveProperty("similarity_score")
expect(typeof jsonResponse.similarity_score).toBe("number")
})
})
Copy link
Contributor

@evgenydmitriev evgenydmitriev Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like unit tests to me. For example, you can't trigger Worker AI yourself - this function is an internal component of the worker you are testing. Same for querying vectorize. The only trigger at your disposal is user's request. You can, however, assert all other worker interactions, including worker AI being called and vector DB being queried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like unit tests to me.

That's what I thought initially when you told me that these services behavior should be tested too. But now I see what you are talking about.

I changed it so now we are just looking at what AI and DB are called with and what they return during normal request to a service, instead of calling them directly. I don't know if this is sufficient, or the abstraction has to be even deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Similarity Search API Integration Testing
4 participants