Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Inference Caching in the Backend Instead of Frontend #56

Open
2 tasks
MaxenceGui opened this issue Feb 9, 2024 · 1 comment
Open
2 tasks

Comments

@MaxenceGui
Copy link

Issue description

The goal is to enable caching capability in the backend system to store the outcome of an inference request made on an image. In case the users opt for different models for the same image, the inference request must be executed just once. Switching to a model that has already produced an inference on a chosen image should return the outcome that has been saved in the cache.

Steps to Reproduce

  1. Send a request to the /inf route two time
  2. The first time, the system returns the inference result from the pipeline
  3. The second time, the system returns the result stored in the cache

Expected Behavior

If the cache does not contain any results for the chosen image and pipeline, the system should invoke the necessary model(s) to produce the inference result. On the other hand, if the system finds a matching image in the cache that corresponds to the pipeline inference result, it should simply return the cached result.

Current Behavior

The model(s) are called at every inference request since the inference result is not cached.

Possible Solution

This issue is referencing this comment:

I'm not exactly sure how it works on the front end, but there is already a caching functionality that keeps track of the result.

scores: [],
classifications: [],
boxes: [],
annotated: false,
imageDims: [],
overlapping: [],
overlappingIndices: [],

As of now, inferencing would overwrite what is in there. Maybe we could add another parameter that would contain the model that displays the score, classification, boxes, and everything else related?

 model:{
   scores: [],
   ...
}

function loadResultToCache

For the backend

To implement the same idea in the backend, we would need to keep track of the image passing in the inference request, pair it with the pipelines, and keep the result when returned. Then if the image is call again, we send back the result from cache.

---
title: Test
---
flowchart TD

image[Send image to inference request]
check(back end check if image already exists <br> in cache and if pipelines name is the same)
inference[image not in cache <br> or pipeline not call]
cache[image in cache <br> and pipeline call]
inf_res[send back result from inference]
cache_res[send back result stored in cache]

image -- from Front end--> check
check --> inference & cache
inference-->inf_res
cache-->cache_res

Loading

Originally posted by @MaxenceGui in ai-cfia/nachet-frontend#96 (comment)

Additional Context

  • There is already a caching functionality in the frontend to display the result

Tasks

  • Implement a caching functionality to store the inference result
  • Implement a checking in inference_request to return the cache result if True
@MaxenceGui MaxenceGui self-assigned this Feb 9, 2024
@MaxenceGui MaxenceGui added this to Nachet Feb 9, 2024
@rngadam
Copy link

rngadam commented Feb 12, 2024

There should be some discussion in th spec about the pipeline and the underlying models version. if a model version in the pipeline is updated, it should invalidate previous call. Also perhaps a mechanism to invalidate cache (from the frontend?).

https://martinfowler.com/bliki/TwoHardThings.html

@MaxenceGui MaxenceGui moved this to Todo in Nachet Feb 20, 2024
@MaxenceGui MaxenceGui modified the milestone: M2(2024 March) Feb 27, 2024
@MaxenceGui MaxenceGui removed their assignment Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants