Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPT-2 Model and its Variants #354

Merged
merged 14 commits into from
Sep 16, 2022
Merged

Conversation

abheesht17
Copy link
Collaborator

@abheesht17 abheesht17 commented Sep 13, 2022

Partially resolves #337

@mattdangerw
Copy link
Member

This looks great! Will take a deeper look, but you can do two things for now

  1. Comment with a pointer to wherever you are grabbing the hyperparameters from for the custom sizes. Does openai provide this anywhere directly?
  2. Update your colab from the issue to actually use a gpt-2 symbols from this branch? That would be a nice validation that we have fully recreated the correct graph here.

@abheesht17
Copy link
Collaborator Author

abheesht17 commented Sep 14, 2022

@mattdangerw,

This looks great! Will take a deeper look, but you can do two things for now

  1. Comment with a pointer to wherever you are grabbing the hyperparameters from for the custom sizes. Does openai provide this anywhere directly?
  2. Update your colab from the issue to actually use a gpt-2 symbols from this branch? That would be a nice validation that we have fully recreated the correct graph here.
  1. They have a config file in their checkpoint file named hparams.json. This is the format of the links:
    https://openaipublic.blob.core.windows.net/gpt-2/models/{num_params}/hparams.json.

    If you want to double check once, please visit these links:
    GPT-2 Small: https://openaipublic.blob.core.windows.net/gpt-2/models/124M/hparams.json
    GPT-2 Medium: https://openaipublic.blob.core.windows.net/gpt-2/models/355M/hparams.json
    GPT-2 Large: https://openaipublic.blob.core.windows.net/gpt-2/models/774M/hparams.json
    GPT-2 ExtraLarge: https://openaipublic.blob.core.windows.net/gpt-2/models/1558M/hparams.json

    intermediate_dim is always 4 * hidden_dim.

  2. Sure, will do this now!

Edit: Here is the updated notebook: https://colab.research.google.com/drive/1vI3dDmFEFOQM0GbDBfVf6sanATjkwV9X?usp=sharing.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm, just a few minor comments.



class Gpt2Custom(keras.Model):
"""Generative Pretrained Transformer-2 (GPT-2) network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get out of the habit of listing the full model name in a first line. That should be short, scannable, and informative.

GPT-2 core network with custom hyperparmeters. or something like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I have shifted the full name below

keras_nlp/models/gpt2.py Outdated Show resolved Hide resolved
keras_nlp/models/gpt2.py Show resolved Hide resolved
keras_nlp/models/gpt2.py Show resolved Hide resolved
Copy link
Contributor

@jbischof jbischof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you!

One request: please move your style changes of other files to a separate PR.

keras_nlp/models/bert.py Outdated Show resolved Hide resolved
keras_nlp/models/gpt2.py Outdated Show resolved Hide resolved
keras_nlp/models/gpt2.py Show resolved Hide resolved
keras_nlp/models/roberta.py Outdated Show resolved Hide resolved
@mattdangerw
Copy link
Member

Also just a heads up I will be pushing a name change cleanup that will probably require updating any checkpoints conversion scripts using TransformerDecoder. But it's just reworking the layer names.

#353

@abheesht17
Copy link
Collaborator Author

Also just a heads up I will be pushing a name change cleanup that will probably require updating any checkpoints conversion scripts using TransformerDecoder. But it's just reworking the layer names.

@mattdangerw, updated: https://colab.research.google.com/drive/1vI3dDmFEFOQM0GbDBfVf6sanATjkwV9X?usp=sharing!

@abheesht17
Copy link
Collaborator Author

Please hold off from merging this; I'll validate ckpt conversion for all 4 sizes

@abheesht17
Copy link
Collaborator Author

Please hold off from merging this; I'll validate ckpt conversion for all 4 sizes

Good to go! :)

@mattdangerw mattdangerw merged commit 3df60d6 into keras-team:master Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate recreating a GPT-2 forward pass with KerasNLP
3 participants