Better documentation around how unloading is triggered #511

dsgibbons · 2024-06-05T02:36:14Z

When loading some models, I receive the WARN log: Memory over-allocation due to under-prediction of model size... (which stems from here) followed by the INFO log: Eviction triggered for model ... (I couldn't find exactly where this comes from). This unloading happens despite it being the only model on a large machine with 64GB RAM, 40GB VRAM and all of the k8s resource limits being set to max.

I've tried to piece together how to avoid this from various GitHub issues (e.g., this one) but would really appreciate some clear documentation around how unloading is triggered in ModelMesh. Even variables such as MODELSIZE_MULTIPLIER as referenced by this reply aren't properly documented, and I can't find where they are used in either the modelmesh or the modelmesh-serving source code.

Could the documentation please be updated to formally describe how models are prioritized and subsequently unloaded with more discussion around the various configurations that we can alter on a per runtime/per isvc basis? I'm happy to contribute by helping to update the documentation, but I don't fully understand the underlying design decisions.

The text was updated successfully, but these errors were encountered:

dsgibbons · 2024-06-05T04:13:26Z

For some additional context, I'm using the Python backend in Triton. An example model that triggers unloading has custom dependencies via conda pack and has a file tree like so:

├── 1/
│   ├── model.py
│   └── model.pkl (approx 100MiB)
├── config.pbtxt
└── conda_env.tar.gz (approx 3GiB)

I'm not sure whether using models in this way messes with how ModelMesh computes usage.

GolanLevy · 2024-06-09T08:15:13Z

Hi @dsgibbons
Please see my reply here kserve/modelmesh#82 (comment), it might help you with the documentation on how modelmesh decides to load/unload models.

Maybe the DEFAULT_MODELSIZE property can help you, especially if most of your models are of the same size.
DEFAULT_MODELSIZE is used to estimate the model size if no prior knowledge is known about the model type before loading it.
According to the code documentation:

// conservative "default" model size,
// such that "most" models are smaller than this

Since most of our models have the same size, setting it to the correct value eliminated the WARN log you are seeing and helped modelmesh make better model allocation decisions.

dsgibbons · 2024-06-10T12:24:11Z

Thank you for linking your reply @GolanLevy. I'd still love to see some formal documentation for this, as it seems like critical information that shouldn't require trawling through the issue tracker. I'll see how I go this week. I hope I'll eventually understand ModelMesh well enough to submit a PR to address this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better documentation around how unloading is triggered #511

Better documentation around how unloading is triggered #511

dsgibbons commented Jun 5, 2024 •

edited

Loading

dsgibbons commented Jun 5, 2024 •

edited

Loading

GolanLevy commented Jun 9, 2024

dsgibbons commented Jun 10, 2024

Better documentation around how unloading is triggered #511

Better documentation around how unloading is triggered #511

Comments

dsgibbons commented Jun 5, 2024 • edited Loading

dsgibbons commented Jun 5, 2024 • edited Loading

GolanLevy commented Jun 9, 2024

dsgibbons commented Jun 10, 2024

dsgibbons commented Jun 5, 2024 •

edited

Loading

dsgibbons commented Jun 5, 2024 •

edited

Loading