-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model characteristics dataset #3
Comments
I can work on this issue. Is there anyone else to work with me? Is there any information already available so that i can better understand what is excepted here ? |
A first thing would be to define a file format, to store information about models. We can start simple with that (example only):
We need to take into account that we will probably include other data in the future, like min/max parameters value for proprietary models. Also have a clear scope of how we can collect this data automatically. Probably someone else on the internet already worked on that? |
Some clarifications after a quick meeting with @AndreaLeylavergne. We will start with the main LLM providers (aligned with what we have already implemented in the package). So we will focus on OpenAI, Mistral AI and Anthropic first. We need to report LLMs in the following spreadsheet : model repository Column description:
The model name should be the same as defined in API documentation:
To find popular models and some assessments on their architecture, we can use this database. |
Description
To compute the impacts of a query, we need some characteristics of the model that was used. Especially in the case of LLMs we need the total count of parameters.
Solution
A CSV or JSON file to store all known models and metadata like the total parameters count.
Considerations
Proprietary models
In many cases, we don't know the underlying architecture of models. thus we will need to guesstimate it (see issue #1 for OpenAI). The estimation can be based on performance achieved by this models in various leaderboards compared to open-weight models. It is crucial to keep the source of this assessment because it influences a lot the impacts.
Total parameters vs active parameters
In the case of mixture of experts models we can definie the active parameters count as the sum of all active/used parameters to run the computation. (example with mixtral).
The text was updated successfully, but these errors were encountered: