Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIFF ONLY - main sync #9

Closed
wants to merge 89 commits into from
Closed

DIFF ONLY - main sync #9

wants to merge 89 commits into from

Conversation

chiragjn
Copy link
Member

No description provided.

winglian and others added 30 commits August 19, 2024 14:59
…-ai-cloud#1828)

* efficiently save very large llms when using FSDP

* fix parsing and index of sharded chunks

* only save fsdp on main process

* debugging for rename

* save sharded state dict

* remove unused new param

* get state dict directly

* tweak acc merge fsdp to shard the weight files

* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
…d#1837)

* fix: dont change quant storage dtype in case of fsdp

* fix black

---------

Co-authored-by: Gal Cohen <[email protected]>
* feat: add jamba chat_template

* fix: black

* feat: jamba fsdp+qlora

---------

Co-authored-by: Gal Cohen <[email protected]>
* rename jamba example

* feat: change readme

---------

Co-authored-by: Gal Cohen <[email protected]>
) [skip ci]

* ensure that the bias is also in the correct dtype

* add nightly for dpo-qlora-fsdp
* corecting phi system prompt

* phi test

* update

* add test
* run nightly ci builds against upstream main

* add test badges

* run the multigpu tests against nightly main builds too
* add initial plugin support w Liger kernel patches

* integrate the input args classes

* fix liger plugin and dynamic configuration class

* drop untrainable samples and refactor config plugins integration

* fix incorrect inputs and circular imports

* fix bool comparison

* fix for dropping untraibable tokens

* fix licensing so liger integration is Apache 2.0

* add jamba support

* pylint ignore
* add liger to readme

* updates from PR feedback
* change up import to prevent AttributeError

* tweak patching check for updated upstream
* clear cuda cache to help with memory leak/creep

* reverse order of gc
* fix the multipack patch for remote code models

* add deepseek v2 lite example w fsdp
…xolotl-ai-cloud#1877)

* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten

* unnecessary now

* add comment
thomascleberg and others added 29 commits October 11, 2024 13:32
…Huggingface Dataset Revision (axolotl-ai-cloud#1912)

* Add support for `revision` dataset parameter

* only use revision on hf hub backed datasets

* use revision tied to head

* set download to use revision

* feat: add config to model validator class

* feat: add revision config to RL and tests for it

---------

Co-authored-by: Wing Lian <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
* add warning that sharegpt will be deprecated

* add helper script for chat_templates and document deprecation

* Update src/axolotl/prompt_strategies/sharegpt.py

Co-authored-by: NanoCode012 <[email protected]>

---------

Co-authored-by: NanoCode012 <[email protected]>
* Update mm_chat.py

Handle string image (paths)

* chore: lint

---------

Co-authored-by: Wing Lian <[email protected]>
* update hf deps

* remove deprecated set_caching_enabled
* wip add new proposed message structure

* tokenization

* wip

* wip transform builder

* wip make the chat dataset loadable

* wip chatml + llama 3 new chat objects

* chore: lint

* chore: lint

* fix tokenization

* remove dacite dependency since we're using pydantic now

* fix handling when already correctly split in messages

* make sure to remove chat features from tokenized ds

* move chat to be a input transform for messages

* make sure llama3 has the bos token

* remove non-working special token code

* fix messages strat loader
* add ds zero3 to multigpu biweekly tests

* fix for upstream api change

* use updated accelerate and fix deepspeed tests

* stringify the Path, and run multigpu tests if the multigpu tests change for a PR

* use correct json rather than yaml

* revert accelerate for deepspeed
* wip on multimodal sample packing support

* wip on multimodal packing support

* llama-1b-yml

* setup logging for test

* yml

* yml

* yml

* fix for __len__ for eval sample packing

* reverted irrelavant changes

* reformatted, reverted log message

* reverted unnecessary changes

* added e2e multigpu testing for eval sample packing

* formatting

* fixed e2e test_eval params

* fix test_eval e2e multigpu

* fix test_eval e2e multigpu

* Update tests/e2e/multigpu/test_eval.py

Co-authored-by: Wing Lian <[email protected]>

* Update tests/e2e/multigpu/test_eval.py

Co-authored-by: Wing Lian <[email protected]>

---------

Co-authored-by: Wing Lian <[email protected]>
* add pytorch 2.5.0 base images

* make sure num examples for debug is zero and fix comparison
* first pass at pytorch 2.5.0 support

* attempt to install causal_conv1d with mamba

* gracefully handle missing xformers

* fix import

* fix incorrect version, add 2.5.0

* increase tests timeout
use a constraint file
use min version of xformers
don't install autoawq with pytorch 2.5.0
debugging for errors
upgrade pip first
fix action yml
add back try/except
retry w/o constraint
use --no-build-isolation
show torch version
install setuptools and wheel
add back try/except
* Ensure hf_mlflow_log_artifact config var is set in env

* Add transformer MLflowCallback to callbacks list when mlflow enabled

* Test hf_mlflow_log_artifacts is set correctly

* Test mlflow not being used by default
* feat: support new arg num_items_in_batch

* use kwargs to manage extra unknown kwargs for now

* upgrade against upstream transformers main

* make sure trl is on latest too

* fix for upgraded trl

* fix: handle trl and transformer signature change

* feat: update trl to handle transformer signature

* RewardDataCollatorWithPadding no longer has max_length

* handle updated signature for tokenizer vs processor class

* invert logic for tokenizer vs processor class

* processing_class, not processor class

* also handle processing class in dpo

* handle model name w model card creation

* upgrade transformers and add a loss check test

* fix install of tbparse requirements

* make sure to add tbparse to req

* feat: revert kwarg to positional kwarg to be explicit

---------

Co-authored-by: Wing Lian <[email protected]>
…-cloud#2000)

* add option for resizing embeddings when adding new tokens

* let's just be opinonated about this setting and set it to False
…otl-ai-cloud#1970)

* Allow using tokenizer's default chat template with fallbacks

Summary of changes:

1. Adds `tokenizer_default` as option for `chat_template` in
   `chat_template` prompt strategy that allows using the chat template
   from tokenizer's config.json
2. Allows falling back to chat templates available in axolotl if
   tokenizer does not have a chat template
3. Adds a mistral chat template which supports system message - taken
   from https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja

---

Why?

Many popular models are not trained with chatml format. As a result for
the model to correctly learn chatml we have to turn on train_on_inputs
which requires more compute and time. If we can use the model's already
learned chat template we can just learn the output tokens

---

Todo:

- Write tests

* Add tests

* Fix lint and bug post merge from main

* Add option `chat_template_jinja` to provide a jinja template

* remove custom mistral template

* Address review comments and add docs

* Update docs/dataset-formats/conversation.qmd

Co-authored-by: NanoCode012 <[email protected]>

* fix: set default to tokenizer template

* Merge branch 'main' into cj_tokenizer_default_prompt_template

* chore: remove redundant function

* fix: re-arrange enum declaration position

* fix: refactor artifact left from main merge

* feat(doc): updated config with chat template options and clarified examples

* chore: clarify doc

* chore: added example for non-default template

* chore: refactor

* fix: test

* fix: config being dropped and unittest to catch that

* chore: lint

* chore: skip duplicate

* fix: rename var after merge

* feat: add test for levy's dpo case

* fix: remove default setting on edge case where chat template overriden in dataset section

* feat: handle sharegpt deprecation better in docs

* feat: add example using fallback

* feat: handles chat_template requiring specific user/assistant order

* fix: update test based on new defaults

* fix: imported name incorrectly updated on merge

* chore: lint

* fix: update dummy message to prevent potential overlap with real content

* fix(doc): formatting

* fix: update bradleyterry to use new chat_template

---------

Co-authored-by: Chirag Jain <[email protected]>
* Hardware requirements

axolotl-ai-cloud#1992

* Update README.md

---------

Co-authored-by: Wing Lian <[email protected]>
…loud#2001) [skip ci]

* feat: update yml chat_template to specify dataset field

* feat: replace sharegpt references with chat_template
@chiragjn chiragjn closed this Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.