DIFF ONLY - main sync #9

chiragjn · 2024-10-29T19:10:06Z

No description provided.

…-ai-cloud#1828) * efficiently save very large llms when using FSDP * fix parsing and index of sharded chunks * only save fsdp on main process * debugging for rename * save sharded state dict * remove unused new param * get state dict directly * tweak acc merge fsdp to shard the weight files * sharded_state_dict alongside save_safetensors seems to hang on checkpoint save

…d#1837) * fix: dont change quant storage dtype in case of fsdp * fix black --------- Co-authored-by: Gal Cohen <[email protected]>

* feat: add jamba chat_template * fix: black * feat: jamba fsdp+qlora --------- Co-authored-by: Gal Cohen <[email protected]>

* rename jamba example * feat: change readme --------- Co-authored-by: Gal Cohen <[email protected]>

…ud#1849) [skip ci]

) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp

…he same (axolotl-ai-cloud#1847) [skip ci]

* corecting phi system prompt * phi test * update * add test

…class is ever init'ed (axolotl-ai-cloud#1850) [skip ci]

* run nightly ci builds against upstream main * add test badges * run the multigpu tests against nightly main builds too

…k support (axolotl-ai-cloud#1854)

…1859)

* add initial plugin support w Liger kernel patches * integrate the input args classes * fix liger plugin and dynamic configuration class * drop untrainable samples and refactor config plugins integration * fix incorrect inputs and circular imports * fix bool comparison * fix for dropping untraibable tokens * fix licensing so liger integration is Apache 2.0 * add jamba support * pylint ignore

* add liger to readme * updates from PR feedback

* change up import to prevent AttributeError * tweak patching check for updated upstream

…tegies when using `chat_template` (axolotl-ai-cloud#1867)

* clear cuda cache to help with memory leak/creep * reverse order of gc

* fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp

…xolotl-ai-cloud#1877) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment

…Huggingface Dataset Revision (axolotl-ai-cloud#1912) * Add support for `revision` dataset parameter * only use revision on hf hub backed datasets * use revision tied to head * set download to use revision * feat: add config to model validator class * feat: add revision config to RL and tests for it --------- Co-authored-by: Wing Lian <[email protected]> Co-authored-by: NanoCode012 <[email protected]>

Co-authored-by: Adam Hazell <[email protected]>

* add warning that sharegpt will be deprecated * add helper script for chat_templates and document deprecation * Update src/axolotl/prompt_strategies/sharegpt.py Co-authored-by: NanoCode012 <[email protected]> --------- Co-authored-by: NanoCode012 <[email protected]>

* Update mm_chat.py Handle string image (paths) * chore: lint --------- Co-authored-by: Wing Lian <[email protected]>

* update hf deps * remove deprecated set_caching_enabled

…d#1951) [skip ci]

* wip add new proposed message structure * tokenization * wip * wip transform builder * wip make the chat dataset loadable * wip chatml + llama 3 new chat objects * chore: lint * chore: lint * fix tokenization * remove dacite dependency since we're using pydantic now * fix handling when already correctly split in messages * make sure to remove chat features from tokenized ds * move chat to be a input transform for messages * make sure llama3 has the bos token * remove non-working special token code * fix messages strat loader

* add ds zero3 to multigpu biweekly tests * fix for upstream api change * use updated accelerate and fix deepspeed tests * stringify the Path, and run multigpu tests if the multigpu tests change for a PR * use correct json rather than yaml * revert accelerate for deepspeed

* update llama3 config * llama3 config

* wip on multimodal sample packing support * wip on multimodal packing support * llama-1b-yml * setup logging for test * yml * yml * yml * fix for __len__ for eval sample packing * reverted irrelavant changes * reformatted, reverted log message * reverted unnecessary changes * added e2e multigpu testing for eval sample packing * formatting * fixed e2e test_eval params * fix test_eval e2e multigpu * fix test_eval e2e multigpu * Update tests/e2e/multigpu/test_eval.py Co-authored-by: Wing Lian <[email protected]> * Update tests/e2e/multigpu/test_eval.py Co-authored-by: Wing Lian <[email protected]> --------- Co-authored-by: Wing Lian <[email protected]>

* add pytorch 2.5.0 base images * make sure num examples for debug is zero and fix comparison

* first pass at pytorch 2.5.0 support * attempt to install causal_conv1d with mamba * gracefully handle missing xformers * fix import * fix incorrect version, add 2.5.0 * increase tests timeout

use a constraint file use min version of xformers don't install autoawq with pytorch 2.5.0 debugging for errors upgrade pip first fix action yml add back try/except retry w/o constraint use --no-build-isolation show torch version install setuptools and wheel add back try/except

…-ai-cloud#1987)

* Ensure hf_mlflow_log_artifact config var is set in env * Add transformer MLflowCallback to callbacks list when mlflow enabled * Test hf_mlflow_log_artifacts is set correctly * Test mlflow not being used by default

* feat: support new arg num_items_in_batch * use kwargs to manage extra unknown kwargs for now * upgrade against upstream transformers main * make sure trl is on latest too * fix for upgraded trl * fix: handle trl and transformer signature change * feat: update trl to handle transformer signature * RewardDataCollatorWithPadding no longer has max_length * handle updated signature for tokenizer vs processor class * invert logic for tokenizer vs processor class * processing_class, not processor class * also handle processing class in dpo * handle model name w model card creation * upgrade transformers and add a loss check test * fix install of tbparse requirements * make sure to add tbparse to req * feat: revert kwarg to positional kwarg to be explicit --------- Co-authored-by: Wing Lian <[email protected]>

…-cloud#2000) * add option for resizing embeddings when adding new tokens * let's just be opinonated about this setting and set it to False

…otl-ai-cloud#1970) * Allow using tokenizer's default chat template with fallbacks Summary of changes: 1. Adds `tokenizer_default` as option for `chat_template` in `chat_template` prompt strategy that allows using the chat template from tokenizer's config.json 2. Allows falling back to chat templates available in axolotl if tokenizer does not have a chat template 3. Adds a mistral chat template which supports system message - taken from https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja --- Why? Many popular models are not trained with chatml format. As a result for the model to correctly learn chatml we have to turn on train_on_inputs which requires more compute and time. If we can use the model's already learned chat template we can just learn the output tokens --- Todo: - Write tests * Add tests * Fix lint and bug post merge from main * Add option `chat_template_jinja` to provide a jinja template * remove custom mistral template * Address review comments and add docs * Update docs/dataset-formats/conversation.qmd Co-authored-by: NanoCode012 <[email protected]> * fix: set default to tokenizer template * Merge branch 'main' into cj_tokenizer_default_prompt_template * chore: remove redundant function * fix: re-arrange enum declaration position * fix: refactor artifact left from main merge * feat(doc): updated config with chat template options and clarified examples * chore: clarify doc * chore: added example for non-default template * chore: refactor * fix: test * fix: config being dropped and unittest to catch that * chore: lint * chore: skip duplicate * fix: rename var after merge * feat: add test for levy's dpo case * fix: remove default setting on edge case where chat template overriden in dataset section * feat: handle sharegpt deprecation better in docs * feat: add example using fallback * feat: handles chat_template requiring specific user/assistant order * fix: update test based on new defaults * fix: imported name incorrectly updated on merge * chore: lint * fix: update dummy message to prevent potential overlap with real content * fix(doc): formatting * fix: update bradleyterry to use new chat_template --------- Co-authored-by: Chirag Jain <[email protected]>

* Hardware requirements axolotl-ai-cloud#1992 * Update README.md --------- Co-authored-by: Wing Lian <[email protected]>

…loud#2001) [skip ci] * feat: update yml chat_template to specify dataset field * feat: replace sharegpt references with chat_template

winglian and others added 30 commits August 19, 2024 14:59

fix: dont change quant storage dtype in case of fsdp (axolotl-ai-clou…

5aac4bc

…d#1837) * fix: dont change quant storage dtype in case of fsdp * fix black --------- Co-authored-by: Gal Cohen <[email protected]>

pretrain: fix with sample_packing=false (axolotl-ai-cloud#1841)

649c19a

feat: add jamba chat_template (axolotl-ai-cloud#1843)

9f91724

* feat: add jamba chat_template * fix: black * feat: jamba fsdp+qlora --------- Co-authored-by: Gal Cohen <[email protected]>

examples: fix tiny-llama pretrain yml syntax (axolotl-ai-cloud#1840)

f07802f

rename jamba example (axolotl-ai-cloud#1846) [skip ci]

957c956

* rename jamba example * feat: change readme --------- Co-authored-by: Gal Cohen <[email protected]>

numpy 2.1.0 was released, but incompatible with numba (axolotl-ai-clo…

c3fc529

…ud#1849) [skip ci]

ensure that the bias is also in the correct dtype (axolotl-ai-cloud#1848

5b0b774

) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp

make the train_on_eos default to turn so all eos tokens are treated t…

9caa3eb

…he same (axolotl-ai-cloud#1847) [skip ci]

fix: prompt phi (axolotl-ai-cloud#1845) [skip ci]

7ed92e6

* corecting phi system prompt * phi test * update * add test

docs: minor syntax highlight fix (axolotl-ai-cloud#1839)

de4ea2d

ensure that the hftrainer deepspeed config is set before the trainer …

2f8037f

…class is ever init'ed (axolotl-ai-cloud#1850) [skip ci]

run nightly ci builds against upstream main (axolotl-ai-cloud#1851)

dcbff16

* run nightly ci builds against upstream main * add test badges * run the multigpu tests against nightly main builds too

rename nightly test and add badge (axolotl-ai-cloud#1853)

b33dc07

most model types now support flash attention 2 regardless of multipac…

fefa95e

…k support (axolotl-ai-cloud#1854)

add axolotl community license (axolotl-ai-cloud#1862)

328fd4b

don't mess with bnb since it needs compiled wheels (axolotl-ai-cloud#…

e8ff5d5

…1859)

add liger example (axolotl-ai-cloud#1864)

da0d581

add liger to readme (axolotl-ai-cloud#1865)

810ecd4

* add liger to readme * updates from PR feedback

change up import to prevent AttributeError (axolotl-ai-cloud#1863)

77a4b9c

* change up import to prevent AttributeError * tweak patching check for updated upstream

simplify logic (axolotl-ai-cloud#1856)

22f4eaf

better handling of llama-3 tool rolw (axolotl-ai-cloud#1782)

f245964

Spectrum plugin (axolotl-ai-cloud#1866)

8e29bde

update specturm authors (axolotl-ai-cloud#1869)

6819c12

Fix drop_long_seq bug due to truncation in prompt tokenization stra…

2dac1ed

…tegies when using `chat_template` (axolotl-ai-cloud#1867)

clear cuda cache to help with memory leak/creep (axolotl-ai-cloud#1858)

17af1d7

* clear cuda cache to help with memory leak/creep * reverse order of gc

Add Liger Kernal support for Qwen2 (axolotl-ai-cloud#1871)

f6362d2

Sample pack trust remote code v2 (axolotl-ai-cloud#1873)

1e43660

* fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp

monkey-patch transformers to simplify monkey-patching modeling code (a…

159b8b9

…xolotl-ai-cloud#1877) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment

thomascleberg and others added 29 commits October 11, 2024 13:32

Add MLFlow run name option in config (axolotl-ai-cloud#1961)

922db77

Co-authored-by: Adam Hazell <[email protected]>

Handle image input as string paths for MMLMs (axolotl-ai-cloud#1958)

df359c8

* Update mm_chat.py Handle string image (paths) * chore: lint --------- Co-authored-by: Wing Lian <[email protected]>

update hf deps (axolotl-ai-cloud#1964)

09bf1ce

* update hf deps * remove deprecated set_caching_enabled

only install torchao for torch versions >= 2.4.0 (axolotl-ai-cloud#1963)

d20b48a

Fixing Validation - Mistral Templates (axolotl-ai-cloud#1962)

31591bd

fix: update eval causal lm metrics to add perplexity (axolotl-ai-clou…

ac128b7

…d#1951) [skip ci]

Add support for qwen 2.5 chat template (axolotl-ai-cloud#1934)

1834cdc

Reward model (axolotl-ai-cloud#1879)

68b1369

upgrade accelerate to 1.0.1 (axolotl-ai-cloud#1969)

335027f

examples: Fix config llama3 (axolotl-ai-cloud#1833) [skip ci]

6d9a3c4

* update llama3 config * llama3 config

also debug if other debug args are set (axolotl-ai-cloud#1977)

54673fd

add pytorch 2.5.0 base images (axolotl-ai-cloud#1979)

67f744d

* add pytorch 2.5.0 base images * make sure num examples for debug is zero and fix comparison

first pass at pytorch 2.5.0 support (axolotl-ai-cloud#1982)

e12a213

* first pass at pytorch 2.5.0 support * attempt to install causal_conv1d with mamba * gracefully handle missing xformers * fix import * fix incorrect version, add 2.5.0 * increase tests timeout

use torch 2.4.1 images as latest now that torch 2.5.0 is out (axolotl…

5c629ee

…-ai-cloud#1987)

Log checkpoints as mlflow artifacts (axolotl-ai-cloud#1976)

9bd5f7d

* Ensure hf_mlflow_log_artifact config var is set in env * Add transformer MLflowCallback to callbacks list when mlflow enabled * Test hf_mlflow_log_artifacts is set correctly * Test mlflow not being used by default

revert image tagged as main-latest (axolotl-ai-cloud#1990)

718cfb2

Refactor func load_model to class ModelLoader (axolotl-ai-cloud#1909)

1d6a5e2

fix zero3 (axolotl-ai-cloud#1994)

d3c45d2

add option for resizing embeddings when adding new tokens (axolotl-ai…

e1e0556

…-cloud#2000) * add option for resizing embeddings when adding new tokens * let's just be opinonated about this setting and set it to False

Hardware requirements (axolotl-ai-cloud#1997) [skip ci]

107b67b

* Hardware requirements axolotl-ai-cloud#1992 * Update README.md --------- Co-authored-by: Wing Lian <[email protected]>

feat: update yml chat_template to specify dataset field (axolotl-ai-c…

8c3a727

…loud#2001) [skip ci] * feat: update yml chat_template to specify dataset field * feat: replace sharegpt references with chat_template

chiragjn closed this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIFF ONLY - main sync #9

DIFF ONLY - main sync #9

chiragjn commented Oct 29, 2024

DIFF ONLY - main sync #9

DIFF ONLY - main sync #9

Conversation

chiragjn commented Oct 29, 2024