Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIFF ONLY - main sync #9

Closed
wants to merge 89 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
e299312
optionally save the final FSDP model as a sharded state dict (#1828)
winglian Aug 19, 2024
5aac4bc
fix: dont change quant storage dtype in case of fsdp (#1837)
xgal Aug 20, 2024
649c19a
pretrain: fix with sample_packing=false (#1841)
tmm1 Aug 21, 2024
9f91724
feat: add jamba chat_template (#1843)
xgal Aug 21, 2024
f07802f
examples: fix tiny-llama pretrain yml syntax (#1840)
tmm1 Aug 21, 2024
957c956
rename jamba example (#1846) [skip ci]
xgal Aug 22, 2024
c3fc529
numpy 2.1.0 was released, but incompatible with numba (#1849) [skip ci]
winglian Aug 22, 2024
5b0b774
ensure that the bias is also in the correct dtype (#1848) [skip ci]
winglian Aug 22, 2024
9caa3eb
make the train_on_eos default to turn so all eos tokens are treated t…
winglian Aug 22, 2024
7ed92e6
fix: prompt phi (#1845) [skip ci]
JohanWork Aug 22, 2024
de4ea2d
docs: minor syntax highlight fix (#1839)
tmm1 Aug 22, 2024
2f8037f
ensure that the hftrainer deepspeed config is set before the trainer …
winglian Aug 22, 2024
dcbff16
run nightly ci builds against upstream main (#1851)
winglian Aug 22, 2024
b33dc07
rename nightly test and add badge (#1853)
winglian Aug 22, 2024
fefa95e
most model types now support flash attention 2 regardless of multipac…
winglian Aug 22, 2024
328fd4b
add axolotl community license (#1862)
winglian Aug 23, 2024
e8ff5d5
don't mess with bnb since it needs compiled wheels (#1859)
winglian Aug 23, 2024
1f686c5
Liger Kernel integration (#1861)
winglian Aug 23, 2024
da0d581
add liger example (#1864)
winglian Aug 23, 2024
810ecd4
add liger to readme (#1865)
winglian Aug 23, 2024
77a4b9c
change up import to prevent AttributeError (#1863)
winglian Aug 23, 2024
22f4eaf
simplify logic (#1856)
winglian Aug 24, 2024
f245964
better handling of llama-3 tool rolw (#1782)
winglian Aug 25, 2024
8e29bde
Spectrum plugin (#1866)
winglian Aug 25, 2024
6819c12
update specturm authors (#1869)
winglian Aug 26, 2024
2dac1ed
Fix `drop_long_seq` bug due to truncation in prompt tokenization stra…
chiwanpark Aug 26, 2024
17af1d7
clear cuda cache to help with memory leak/creep (#1858)
winglian Aug 26, 2024
f6362d2
Add Liger Kernal support for Qwen2 (#1871)
chiwanpark Aug 27, 2024
1e43660
Sample pack trust remote code v2 (#1873)
winglian Aug 27, 2024
159b8b9
monkey-patch transformers to simplify monkey-patching modeling code (…
tmm1 Aug 28, 2024
c1a61ae
fix liger plugin load issues (#1876)
tmm1 Aug 28, 2024
7037e3c
deepseekv2 liger support (#1878)
tmm1 Aug 28, 2024
e3a3845
Add liger kernel to features (#1881) [skip ci]
ByronHsu Aug 29, 2024
ce33e1e
pin liger-kernel to latest 0.2.1 (#1882) [skip ci]
winglian Aug 30, 2024
15408d0
Update supported models for Liger Kernel (#1875)
DocShotgun Sep 1, 2024
3c6b9ed
run pytests with varied pytorch versions too (#1883)
winglian Sep 1, 2024
bdab3ec
Fix RMSNorm monkey patch for Gemma models (#1886)
chiwanpark Sep 1, 2024
0aeb277
add e2e smoke tests for llama liger integration (#1884)
winglian Sep 1, 2024
4e5400c
support for auto_find_batch_size when packing (#1885)
winglian Sep 4, 2024
dca1fe4
fix optimizer + fsdp combination in example (#1893)
winglian Sep 4, 2024
f18f426
Docs for AMD-based HPC systems (#1891)
tijmen Sep 5, 2024
93b769a
lint fix and update gha regex (#1899)
winglian Sep 5, 2024
ab461d8
Fix documentation for pre-tokenized dataset (#1894)
alpayariyak Sep 5, 2024
6e35468
fix zero3 integration (#1897)
winglian Sep 5, 2024
3853ab7
bump accelerate to 0.34.2 (#1901)
winglian Sep 7, 2024
5c42f11
remove dynamic module loader monkeypatch as this was fixed upstream (…
winglian Sep 14, 2024
7b9f669
Trigger the original tokenization behavior when no advanced turn sett…
fozziethebeat Sep 14, 2024
d7eea2f
validation fixes 20240923 (#1925)
winglian Sep 24, 2024
b98d7d7
update upstream deps versions and replace lora+ (#1928)
winglian Sep 26, 2024
61aa291
fix for empty lora+ lr embedding (#1932)
winglian Sep 27, 2024
8443310
bump transformers to 4.45.1 (#1936)
winglian Sep 30, 2024
e1915f5
Multimodal Vision Llama - rudimentary support (#1940)
winglian Oct 3, 2024
4ca0a47
add 2.4.1 to base models (#1953)
winglian Oct 9, 2024
e8d3da0
upgrade pytorch from 2.4.0 => 2.4.1 (#1950)
winglian Oct 9, 2024
a560593
fix(log): update perplexity log to clarify from eval split (#1952) [s…
NanoCode012 Oct 9, 2024
dee7723
fix type annotations (#1941) [skip ci]
bxptr Oct 9, 2024
6d3caad
Comet integration (#1939)
Lothiraldan Oct 9, 2024
979534c
add mistral templates (#1927)
pandora-s-git Oct 10, 2024
8159cbd
lm_eval harness post train (#1926)
winglian Oct 10, 2024
2fbc6b0
Axo logo new (#1956)
winglian Oct 10, 2024
e73b8df
Add Support for `revision` Dataset Parameter to specify reading from …
thomascleberg Oct 11, 2024
922db77
Add MLFlow run name option in config (#1961)
awhazell Oct 11, 2024
7688385
add warning that sharegpt will be deprecated (#1957)
winglian Oct 11, 2024
df359c8
Handle image input as string paths for MMLMs (#1958)
afrizalhasbi Oct 11, 2024
09bf1ce
update hf deps (#1964)
winglian Oct 12, 2024
d20b48a
only install torchao for torch versions >= 2.4.0 (#1963)
winglian Oct 13, 2024
31591bd
Fixing Validation - Mistral Templates (#1962)
pandora-s-git Oct 13, 2024
ac128b7
fix: update eval causal lm metrics to add perplexity (#1951) [skip ci]
NanoCode012 Oct 13, 2024
1834cdc
Add support for qwen 2.5 chat template (#1934)
amazingvince Oct 13, 2024
cd2d89f
wip add new proposed message structure (#1904)
winglian Oct 13, 2024
68b1369
Reward model (#1879)
winglian Oct 13, 2024
ec4272c
add ds zero3 to multigpu biweekly tests (#1900)
winglian Oct 13, 2024
335027f
upgrade accelerate to 1.0.1 (#1969)
winglian Oct 14, 2024
6d9a3c4
examples: Fix config llama3 (#1833) [skip ci]
JohanWork Oct 14, 2024
54673fd
also debug if other debug args are set (#1977)
winglian Oct 17, 2024
f62e237
memoize dataset length for eval sample packing (#1974)
bursteratom Oct 17, 2024
67f744d
add pytorch 2.5.0 base images (#1979)
winglian Oct 18, 2024
e12a213
first pass at pytorch 2.5.0 support (#1982)
winglian Oct 21, 2024
955cca4
don't explicitly set cpu pytorch version (#1986)
winglian Oct 21, 2024
5c629ee
use torch 2.4.1 images as latest now that torch 2.5.0 is out (#1987)
winglian Oct 21, 2024
9bd5f7d
Log checkpoints as mlflow artifacts (#1976)
awhazell Oct 22, 2024
718cfb2
revert image tagged as main-latest (#1990)
winglian Oct 22, 2024
1d6a5e2
Refactor func load_model to class ModelLoader (#1909)
MengqingCao Oct 25, 2024
2501c1a
Fix: Gradient Accumulation issue (#1980)
NanoCode012 Oct 25, 2024
d3c45d2
fix zero3 (#1994)
winglian Oct 28, 2024
e1e0556
add option for resizing embeddings when adding new tokens (#2000)
winglian Oct 28, 2024
bfc77b0
Feat: Add support for tokenizer’s or custom jinja chat_template (#1970)
NanoCode012 Oct 29, 2024
107b67b
Hardware requirements (#1997) [skip ci]
OliverKunc Oct 29, 2024
8c3a727
feat: update yml chat_template to specify dataset field (#2001) [skip…
NanoCode012 Oct 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .github/workflows/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,19 @@ jobs:
cuda_version: 12.4.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.4.0
pytorch: 2.4.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
- cuda: "124"
cuda_version: 12.4.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.4.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
- cuda: "124"
cuda_version: 12.4.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.5.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
steps:
- name: Checkout
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- '**.py'
- 'requirements.txt'
- '.github/workflows/*.yml'
- "*.md"
- "*.[q]md"
- "examples/**/*.y[a]?ml"
workflow_dispatch:

Expand Down
14 changes: 12 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,12 @@ jobs:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.0
pytorch: 2.4.1
axolotl_extras:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
axolotl_extras:
runs-on: axolotl-gpu-runner
steps:
Expand Down Expand Up @@ -84,7 +89,12 @@ jobs:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.0
pytorch: 2.4.1
axolotl_extras:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
axolotl_extras:
runs-on: axolotl-gpu-runner
steps:
Expand Down
18 changes: 18 additions & 0 deletions .github/workflows/multi-gpu-e2e.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
name: docker-multigpu-tests-biweekly

on:
pull_request:
paths:
- 'tests/e2e/multigpu/*.py'
workflow_dispatch:
schedule:
- cron: '0 0 * * 1,4' # Runs at 00:00 UTC every monday & thursday
Expand All @@ -18,6 +21,20 @@ jobs:
pytorch: 2.3.1
axolotl_extras:
num_gpus: 2
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.1
axolotl_extras:
num_gpus: 2
nightly_build: "true"
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
axolotl_extras:
num_gpus: 2
nightly_build: "true"
runs-on: [self-hosted, modal]
timeout-minutes: 120
steps:
Expand All @@ -39,6 +56,7 @@ jobs:
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
echo "NIGHTLY_BUILD=${{ matrix.nightly_build }}" >> $GITHUB_ENV
- name: Run tests job on Modal
run: |
modal run cicd.multigpu
14 changes: 12 additions & 2 deletions .github/workflows/nightlies.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,12 @@ jobs:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.0
pytorch: 2.4.1
axolotl_extras:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
axolotl_extras:
runs-on: axolotl-gpu-runner
steps:
Expand Down Expand Up @@ -83,7 +88,12 @@ jobs:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.0
pytorch: 2.4.1
axolotl_extras:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
axolotl_extras:
runs-on: axolotl-gpu-runner
steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
run: |
pip3 install wheel packaging
pip3 install -e .
pip3 install -r requirements-tests.txt
pip3 install -r requirements-dev.txt -r requirements-tests.txt

- name: Extract tag name
id: tag
Expand Down
128 changes: 128 additions & 0 deletions .github/workflows/tests-nightly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
name: Tests Nightly against upstream main
on:
workflow_dispatch:
schedule:
- cron: '0 0 * * *' # Runs at 00:00 UTC every day

jobs:
pre-commit:
name: pre-commit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: 'pip' # caching pip dependencies
- uses: pre-commit/[email protected]
env:
SKIP: no-commit-to-branch

pytest:
name: PyTest
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python_version: ["3.10", "3.11"]
pytorch_version: ["2.3.1", "2.4.1", "2.5.0"]
timeout-minutes: 20

steps:
- name: Check out repository code
uses: actions/checkout@v3

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python_version }}
cache: 'pip' # caching pip dependencies

- name: Install PyTorch
run: |
pip3 install torch==${{ matrix.pytorch_version }} --index-url https://download.pytorch.org/whl/cpu

- name: Update requirements.txt
run: |
sed -i 's#^transformers.*#transformers @ git+https://github.com/huggingface/transformers.git@main#' requirements.txt
sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt
sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt
sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt

- name: Install dependencies
run: |
pip3 install --upgrade pip
pip3 install --upgrade packaging
pip3 install -U -e .
pip3 install -r requirements-dev.txt -r requirements-tests.txt

- name: Run tests
run: |
pytest --ignore=tests/e2e/ tests/

- name: cleanup pip cache
run: |
find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;

docker-e2e-tests:
if: github.repository_owner == 'axolotl-ai-cloud'
# this job needs to be run on self-hosted GPU runners...
runs-on: [self-hosted, modal]
timeout-minutes: 60
needs: [pre-commit, pytest]

strategy:
fail-fast: false
matrix:
include:
- cuda: 121
cuda_version: 12.1.1
python_version: "3.10"
pytorch: 2.3.1
num_gpus: 1
axolotl_extras: mamba-ssm
nightly_build: "true"
- cuda: 121
cuda_version: 12.1.1
python_version: "3.11"
pytorch: 2.3.1
num_gpus: 1
axolotl_extras: mamba-ssm
nightly_build: "true"
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.1
num_gpus: 1
axolotl_extras:
nightly_build: "true"
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
num_gpus: 1
axolotl_extras:
nightly_build: "true"
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install Modal
run: |
python -m pip install --upgrade pip
pip install modal==0.63.64 jinja2
- name: Update env vars
run: |
echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
echo "NIGHTLY_BUILD=${{ matrix.nightly_build }}" >> $GITHUB_ENV
- name: Run tests job on Modal
run: |
modal run cicd.tests
25 changes: 20 additions & 5 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ jobs:
fail-fast: false
matrix:
python_version: ["3.10", "3.11"]
pytorch_version: ["2.3.1", "2.4.1", "2.5.0"]
timeout-minutes: 20

steps:
Expand All @@ -48,12 +49,20 @@ jobs:
python-version: ${{ matrix.python_version }}
cache: 'pip' # caching pip dependencies

- name: Install dependencies
- name: upgrade pip
run: |
pip3 install --upgrade pip
pip3 install --upgrade packaging
pip3 install --upgrade packaging setuptools wheel

- name: Install PyTorch
run: |
pip3 install torch==${{ matrix.pytorch_version }}

- name: Install dependencies
run: |
pip3 show torch
pip3 install -U -e .
pip3 install -r requirements-tests.txt
pip3 install -r requirements-dev.txt -r requirements-tests.txt

- name: Run tests
run: |
Expand All @@ -67,7 +76,7 @@ jobs:
if: github.repository_owner == 'axolotl-ai-cloud'
# this job needs to be run on self-hosted GPU runners...
runs-on: [self-hosted, modal]
timeout-minutes: 60
timeout-minutes: 90
needs: [pre-commit, pytest]

strategy:
Expand All @@ -89,7 +98,13 @@ jobs:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.4.0
pytorch: 2.4.1
num_gpus: 1
axolotl_extras:
- cuda: 124
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.0
num_gpus: 1
axolotl_extras:
steps:
Expand Down
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[settings]
profile=black
known_third_party=wandb
known_third_party=wandb,comet_ml
3 changes: 3 additions & 0 deletions .mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ ignore_errors = True
[mypy-axolotl.models.mixtral.*]
ignore_errors = True

[mypy-axolotl.integrations.liger.models.*]
ignore_errors = True

[mypy-axolotl.models.phi.*]
ignore_errors = True

Expand Down
Loading
Loading