Add GPT-2 Model and its Variants #354

abheesht17 · 2022-09-13T11:45:59Z

Partially resolves #337

keras_nlp/models/gpt2.py

mattdangerw · 2022-09-13T22:27:53Z

This looks great! Will take a deeper look, but you can do two things for now

Comment with a pointer to wherever you are grabbing the hyperparameters from for the custom sizes. Does openai provide this anywhere directly?
Update your colab from the issue to actually use a gpt-2 symbols from this branch? That would be a nice validation that we have fully recreated the correct graph here.

keras_nlp/models/gpt2.py

abheesht17 · 2022-09-14T02:58:32Z

@mattdangerw,

This looks great! Will take a deeper look, but you can do two things for now

Comment with a pointer to wherever you are grabbing the hyperparameters from for the custom sizes. Does openai provide this anywhere directly?

Update your colab from the issue to actually use a gpt-2 symbols from this branch? That would be a nice validation that we have fully recreated the correct graph here.

They have a config file in their checkpoint file named hparams.json. This is the format of the links:
https://openaipublic.blob.core.windows.net/gpt-2/models/{num_params}/hparams.json.

If you want to double check once, please visit these links:
GPT-2 Small: https://openaipublic.blob.core.windows.net/gpt-2/models/124M/hparams.json
GPT-2 Medium: https://openaipublic.blob.core.windows.net/gpt-2/models/355M/hparams.json
GPT-2 Large: https://openaipublic.blob.core.windows.net/gpt-2/models/774M/hparams.json
GPT-2 ExtraLarge: https://openaipublic.blob.core.windows.net/gpt-2/models/1558M/hparams.json

intermediate_dim is always 4 * hidden_dim.
Sure, will do this now!

Edit: Here is the updated notebook: https://colab.research.google.com/drive/1vI3dDmFEFOQM0GbDBfVf6sanATjkwV9X?usp=sharing.

mattdangerw

Overall lgtm, just a few minor comments.

mattdangerw · 2022-09-14T22:31:06Z

keras_nlp/models/gpt2.py

+
+
+class Gpt2Custom(keras.Model):
+    """Generative Pretrained Transformer-2 (GPT-2) network.


I think we should get out of the habit of listing the full model name in a first line. That should be short, scannable, and informative.

GPT-2 core network with custom hyperparmeters. or something like that

Cool! I have shifted the full name below

keras_nlp/models/gpt2.py

jbischof

Great work, thank you!

One request: please move your style changes of other files to a separate PR.

keras_nlp/models/bert.py

keras_nlp/models/gpt2.py

keras_nlp/models/roberta.py

mattdangerw · 2022-09-14T23:50:23Z

Also just a heads up I will be pushing a name change cleanup that will probably require updating any checkpoints conversion scripts using TransformerDecoder. But it's just reworking the layer names.

#353

abheesht17 · 2022-09-15T01:26:54Z

Also just a heads up I will be pushing a name change cleanup that will probably require updating any checkpoints conversion scripts using TransformerDecoder. But it's just reworking the layer names.

@mattdangerw, updated: https://colab.research.google.com/drive/1vI3dDmFEFOQM0GbDBfVf6sanATjkwV9X?usp=sharing!

abheesht17 · 2022-09-15T23:57:46Z

Please hold off from merging this; I'll validate ckpt conversion for all 4 sizes

abheesht17 · 2022-09-16T01:10:05Z

Please hold off from merging this; I'll validate ckpt conversion for all 4 sizes

Good to go! :)

abheesht17 added 4 commits September 13, 2022 17:15

Add GPT-2

7ed2971

Small edit

0cce97c

Fix doc-string

7596242

Fix some typos

cba7a70

mattdangerw requested review from jbischof and mattdangerw September 13, 2022 17:19

mattdangerw reviewed Sep 13, 2022

View reviewed changes

keras_nlp/models/gpt2.py Show resolved Hide resolved

mattdangerw reviewed Sep 13, 2022

View reviewed changes

keras_nlp/models/gpt2.py Outdated Show resolved Hide resolved

abheesht17 added 4 commits September 14, 2022 13:42

Rename GPTXLarge to GPTExtraLarge

ef39e1e

Small changes

ea2bb5b

Small fix

80123cf

Format

c1d448d

mattdangerw requested changes Sep 14, 2022

View reviewed changes

jbischof approved these changes Sep 14, 2022

View reviewed changes

keras_nlp/models/bert.py Outdated Show resolved Hide resolved

keras_nlp/models/gpt2.py Outdated Show resolved Hide resolved

keras_nlp/models/gpt2.py Show resolved Hide resolved

keras_nlp/models/roberta.py Outdated Show resolved Hide resolved

abheesht17 added 3 commits September 15, 2022 06:02

Merge branch 'master' into gpt-2

3bddd3d

Address comments - II

d380797

Reverse other file changes

a8f2801

abheesht17 added 3 commits September 15, 2022 11:45

Small edit

d265e94

Update doc-string

0eb7f96

Remove param ct

ce9d9ee

mattdangerw merged commit 3df60d6 into keras-team:master Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPT-2 Model and its Variants #354

Add GPT-2 Model and its Variants #354

abheesht17 commented Sep 13, 2022 •

edited

Loading

mattdangerw commented Sep 13, 2022

abheesht17 commented Sep 14, 2022 •

edited

Loading

mattdangerw left a comment

mattdangerw Sep 14, 2022

abheesht17 Sep 15, 2022

jbischof left a comment

mattdangerw commented Sep 14, 2022

abheesht17 commented Sep 15, 2022

abheesht17 commented Sep 15, 2022

abheesht17 commented Sep 16, 2022



		class Gpt2Custom(keras.Model):
		"""Generative Pretrained Transformer-2 (GPT-2) network.

Add GPT-2 Model and its Variants #354

Add GPT-2 Model and its Variants #354

Conversation

abheesht17 commented Sep 13, 2022 • edited Loading

mattdangerw commented Sep 13, 2022

abheesht17 commented Sep 14, 2022 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Sep 14, 2022

Choose a reason for hiding this comment

abheesht17 Sep 15, 2022

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

mattdangerw commented Sep 14, 2022

abheesht17 commented Sep 15, 2022

abheesht17 commented Sep 15, 2022

abheesht17 commented Sep 16, 2022

abheesht17 commented Sep 13, 2022 •

edited

Loading

abheesht17 commented Sep 14, 2022 •

edited

Loading