Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[card-cache-service] optimize caching
# Core Changes: - created a background service that will poll all the cards/dataupdate for individual tasks - The background service will run one process per taskspec for a small amount of time specified via the ENV vars - This service will be launched when cards are requested. - The reading of cards will be happenning directly from cache and the reads will be "best-effort" - API routes to get a card / list cards will have async-waits until the cache is updated. - The new optimization will require the MF GUI to also be up-to-date with the new server. - Uses a new optimized mf client. - Metaflow UI which keeps best effor polling new cards every 0.5 seconds can work best with new server. - async routines that will clean up the cache and remove completed async-processes - removed dead code which will no longer be used. # Why not use the existing cache client: - The way the existing cache client works, it loads the entire `Task` / `Card` object in memory and then returns the html/data from it. - This is inefficient because load the `Card` object does datastore list calls which are time expensive. - Once the path to the cards/data-updates has been found, getting the actual object is very fast. - For example, listing cards, takes ~ 1-2 seconds, but getting the actual card once the path is resolved takes ~ 10 milliseconds. - The current cache actions are "stateless" meaning, that once the action is done, the previous state is lost when a new action is called. - This stateless nature is not good for cards, where the data may change a lot more frequently but paths won't change. - The new cache service retrives the object paths once and then keeps updating them until the background-process finishes execution. - This approach improves latency drastically # Configuration Options: - `CARD_CACHE_PROCESS_NO_CARD_WAIT_TIME` : How long should the process wait for a card to be available before it exits - `CARD_CACHE_PROCESS_MAX_UPTIME` : The max duration the process should run - `CARD_CACHE_CARD_LIST_POLLING_FREQUENCY` : How frequently should the process poll for listing new cards - `CARD_CACHE_CARD_UPDATE_POLLING_FREQUENCY` : How frequently should the process poll for the card html content - `CARD_CACHE_DATA_UPDATE_POLLING_FREQUENCY` : How frequently should the process poll for the data updates - `CARD_CACHE_DISK_CLEANUP_INTERVAL`: The interval at which the cached cards are stored should be cleaned up - `CARD_API_HTML_WAIT_TIME`: the timeperiod the card HTML retrieval API will max busy wait for the card to be ready before timing out and resulting in null response.
- Loading branch information