Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The explicit model enumeration and model info data is certainly not ideal. #5433

Open
1315267345851168 opened this issue Aug 26, 2024 · 0 comments

Comments

@1315267345851168
Copy link

          The explicit model enumeration and model info data is certainly not ideal.

We could move some of this model data to an external file and allow reading
a model config file of some sort to make it easier to integrate new models.
I think Anders points out the larger, more difficult issues in making this
more general, though.

-Ambrose

On Mon, Jun 6, 2022 at 1:31 AM Anders Andreassen @.***>
wrote:

Unfortunately, there are some technical limitations to using any HF model.
In my experience, HF does not provide any uniform interface for scoring or
generating text across arbitrary models.
There are some subclasses that do have (mostly) uniform interfaces
(something like transformers.AutoModelWithLMHead
https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#automodelwithlmhead),
but even in those cases there is lots of inconsistent behavior. This is
especially subtle if one wants to support batched evaluations (which is
necessary if one wants to evaluate full tasks or bigbench-lite). There is
no uniform standard for how to deal with padding and batching for different
model implementations, and I've even seen inconsistencies between the TF,
PyTorch, and Flax implementations of the same model. For example, GPT-2
PyTorch supports batch generation, but GPT-2 TF does not (it does not fail,
but gives nonsense answers because of not handling padding tokens properly).
I've voiced this concert to some of the HF/Transformers people, so they
should be aware of this issue. Ultimately this would have to be solved by
HF and not us.
It is possible to write custom logic that allows for evaluation of
specific models, but unfortunately doing this for all HF models is not
feasible at the moment.

That being said, the gpt-neo implementation might be close enough to the
other HF GPT implementations such that it might not be too hard to add
support for that model. So if they want to submit a PR adding this support,
that would be great!

On Mon, Jun 6, 2022 at 1:49 AM Guy Gur-Ari @.***> wrote:

+Ambrose Slone @.> +Anders Andreassen
@.
> please correct me if I am wrong here. I think
the reason only some HF models are supported is that we made all the
information about those models (number of training steps, number of
parameters etc.) available explicitly under MODEL_INFO. As far as I know
there is no technical limitation to supporting other HF models, and I think
it should be easy to have a placeholder ModelData to be used when this
information is not available. Maybe the easiest way to do this would be to
turn MODEL_INFO into a collections.defaultdict() which returns this
placeholder MODEL_DATA by default. There would still be a bit of work
connecting each model name to the HuggingFace model class and tokenizer in
MODEL_CLASSES.

Best,
Guy

On Sat, Jun 4, 2022 at 10:12 AM Stella Biderman @.***>
wrote:

The info here
https://github.com/google/BIG-bench/blob/main/docs/doc.md#testing-and-submitting
and here
google/BIG-bench#830 (comment)
lead me to believe that the codebase supports HuggingFace transformer
models, but I tried running python bigbench/evaluate_task.py --models
EleutherAI/gpt-neo-1.3B --task bbq_lite and got invalid model:
EleutherAI/gpt-neo-1.3B, valid models: ['gpt2', 'gpt2-medium',
'gpt2-large', 'gpt2-xl', 'openai-gpt'].

Does that mean that I need to manually edit the benchmark to include
each model I care about in order to run HF models other than the
pre-specified ones? It seems that it would take a substantial re-design
(something I started and abandoned out of frustration in the past) to run
anything other than the handful of pre-approved models. Is this deliberate?
Am I missing something?


Reply to this email directly, view it on GitHub
google/BIG-bench#835, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AADGRN6SSCRVLJTL37JNJTTVNOE7XANCNFSM5X3XFWBA
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

--
[image: X] Anders Andreassen
Research Scientist
@.*** | x.company

Originally posted by @guygurari in google/BIG-bench#835 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@1315267345851168 and others