-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GPT-J #154
Comments
The above is an app, do you have a link to the model implementation and params from HF? :) |
Oh sorry! Yes this one is the model https://huggingface.co/togethercomputer/GPT-JT-6B-v1. Does this have similarities with other models that I could try to implement or even use directly? I would have no idea where to start but willing to help |
Ah, nice! We support GPT 2, so maybe that can be used as a building block? Or at least you can compare GPT 2 Python's implementation with our GPT 2 and then to the same to implement your own. :) |
The reference hf/transformers implementation of GPT-J is here. The implementation should be for the most part similar to any other text model we have, like GPT-2. By a brief look I think we may need to adjust/extend our attention implementation to support the rotary position embedding, but it's fine to modify the current code as necessary and we can find the best way to make it configurable later. |
Working on this but it is gonna take a while 'cause I am new to transformers |
Any plan on adding support for togethercomputer/GPT-JT (https://huggingface.co/spaces/togethercomputer/GPT-JT).
Seems like the closest alternative to GPT-3. What do you think? I would love to help but I don't know where to start
The text was updated successfully, but these errors were encountered: