-
-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Example on Free T4 GPU through Google Colab #1905
Comments
I can confirm this. Here is the full report with the error message as well:
|
can confirm this. @winglian could you kindly take a look and update the colab? thanks |
If anyone finds this thread looking to fix the error with installing Axolotl on Colab, I was getting an error where there were conflicting versions of torch required. This is because the setup for Axolotl keeps the already installed version of torch as the required version. However, xformers required a different version. Therefore, if you first run: pip install torch==2.3.1 before running the axolotl install commands pip install -e git+https://github.com/axolotl-ai-cloud/axolotl#egg=axolotl then it installs correctly on my system. |
Thanks for the suggestion @Peter-Devine but if you try running this in Colab, it still doesn't work yet. |
I have got it working using these packages: ! pip install torch==2.3.1
! pip install -e git+https://github.com/axolotl-ai-cloud/axolotl#egg=axolotl
! pip install -q datasets evaluate trl
! pip install torchvision==0.18.1 torchaudio==2.3.1
! pip install flash-attn and running using this command: ! accelerate launch -m axolotl.cli.train /content/training.yaml My environment is L4 (High memory). Can you not reproduce my results with these installs/commands? |
I was unable to reproduce on Colab with your commands, but I did manage to get it working with these:
|
Hmmm, I guess that different usages require different installations as I have the code working, but I am not performing QLoRA so I don't require PEFT etc. For the sake of clarity, here is a fully reproducible example Colab of training that is currently working for me: https://colab.research.google.com/drive/1yBxLqCVZP4IMqMQmgeCL4Dqpj_fr2pCV?usp=sharing If you could post a similar notebook, I think that would be helpful to other users of Axolotl. Thanks! |
Here's the working Colab example (as of October 10, 2024) https://colab.research.google.com/drive/1YSCRpWWqGdGXjPlsja1c3rJzePo8VNN-?usp=sharing from my end. Good idea to post the whole thing! |
Please check that this issue hasn't been reported before.
Expected Behavior
The setup or first 2 cells in the notebook can install the environment. This set of installation code should run all subsequent code to successfully recreate the training and inference steps of LLM's through Axolotl. https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb
Current behaviour
The first cell is successful,
The second cell never completes and even if we try to change some of the code, the
accelerate
command doesn't work. It hangs at flash attention.Here's the tail of the installation output from the cell,
Steps to reproduce
axolotl-ai-cloud
for organization and selectaxolotl
for repositoryConfig yaml
Possible solution
This might be because the T4 GPU is not supported by this library. Here's the documentation about which GPU's are supported, https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#nvidia-cuda-support
I used the instructions in the README instead to install and train TinyLLAMA using Axolotl instead. I also changed the config to set the flash attention layer to false. This still installs the flash-attn somehow but we get
RuntimeError: FlashAttention only supports Ampere GPUs or newer
error at runtime instead.This cell will install it,
Then in the config.yaml,
This does run on the free T4 GPU but still takes hours to finish and may need a different config.
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main
Acknowledgements
The text was updated successfully, but these errors were encountered: