V4.40 release IFU #36

Cemberk · 2024-05-29T18:11:56Z

IFU for version 4.40 upstream

Remove unused code

…gingface#29605) * Update docstring for RMSNorm * Update cache_params object to correct MambaCache type * Update docstrings and type info * Pass through use_cache * ruff * Reformat with 119 char limit per line (thanks Arthur) * Pass through use_cache specifically to the backbone rather than all keyword arguments * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Arthur <[email protected]> * Update tab * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Arthur <[email protected]> --------- Co-authored-by: Arthur <[email protected]>

* Initial commit (still lots of unfinished bits) * (Still untested) add safetensors sharding to save_pretrained * Fix savetensors saving, update default shard size to match PT * Add proper loading of TF-format safetensors * Revert default size in case that changes things * Fix incorrect index name * Update loading priority * Update tests * Make the tests a little more stringent * Expand tests * Add sharded cross-test * Fix argument name * One more test fix * Adding mlx to the list of allowed formats * Remove irrelevant block for safetensors * Refactor warning logging into a separate function * Remove unused skip_logger_warnings arg * Update src/transformers/modeling_tf_utils.py Co-authored-by: amyeroberts <[email protected]> * Move function def --------- Co-authored-by: amyeroberts <[email protected]>

* Add correct batched handling for apply_chat_template * Fix warning method * Add error for incompatible options * expand tests * Add a skip for markuplm * Add skips for other layout models * Skip for LayoutLMv2 * Slightly update the warning message * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <[email protected]> * typo fix * Update docstring for conversation kwarg * Update return docstring * Remove the warning, improve error message * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <[email protected]> * Update tests/test_tokenization_common.py Co-authored-by: amyeroberts <[email protected]> * Update tests/test_tokenization_common.py Co-authored-by: amyeroberts <[email protected]> * Remove return_dict=None * Fix up some merge cruft * More merge cruft * Add another skip * Add another skip --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <[email protected]>

* Update test reqs * Clean

update Co-authored-by: ydshieh <[email protected]>

…for `load_in_4bit` and `load_in_8bit` (huggingface#29761) * added safety checkers for load_in_4bit and load_in_8bit on init, as well as their setters * Update src/transformers/utils/quantization_config.py typo correction for load_in_8bit setter checks Co-authored-by: Younes Belkada <[email protected]> --------- Co-authored-by: Younes Belkada <[email protected]>

…9753) * attempt to fix * the actual fix that works with compilation! * this? * temporary update * nit? * dispatcg to memory efficient? * update both models that have static cache support * fix copies fix compile * make sure fix * fix cohere and gemma * fix beams? * nit * slipped through the cracks * nit * nits * update * fix-copies * skip failing tests * nits

…ce#29767) [docs] Remove redundant and from custom_tools.md

Update quantization_config.py Fixed typo for clarity and correctness. previous: input time current: input type // changed time to type to fix the typo

* Calculating box_bias at the start once, then reusing it at inference * Updating the compute_box_bias function for backwards compatibility * Caching compute_box_bias function * Bux fix * Update owlv2 accordingly to ensure repo consistency * Co-authored by: nvbinh15 <[email protected]> * Fixup changes * Made copied code consistent * Co-authored by: nvbinh15 <[email protected]> --------- Co-authored-by: Nguyen Van Binh <> Co-authored-by: Nguyen Van Binh <[email protected]>

Fixes ``` File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 987, in <module> class AutoConfig: File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1011, in AutoConfig @replace_list_option_in_docstrings() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 966, in docstring_decorator lines = docstrings.split("\n") ^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'split' ```

…e#29636) fix issue with logit processor in beam search in Flax

…ngface#29764) * update * update --------- Co-authored-by: ydshieh <[email protected]>

* path llava-next * styling * styling

* Cast bfloat16 to float32 for Numpy conversions * Add test

* Remove deprecations * Clean

* Add deterministic config * Add note on slowdown * English fails me again

feat: add support for torch_dtype Co-authored-by: Jacky Lee <[email protected]>

…ace#29663) * always convert the mask * rebase and fix copies

* prepend "bos" to blip generation * minor changes * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/models/instructblip/modeling_instructblip.py Co-authored-by: amyeroberts <[email protected]> * add generation tester mixin --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: amyeroberts <[email protected]>

…ngface#29680) * change in-place -> out-of-place * add tests * add more tests * naming consistency * fix doctest * forgot min-length processors * empty * Revert "fix doctest" This reverts commit 4772768. * revert change in docstring * Update tests/generation/test_logits_process.py Co-authored-by: amyeroberts <[email protected]> * Update tests/generation/test_logits_process.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

…gingface#29771) * update quality check * make it nice * update * let's make sure it runs and we have the logs actually * update workflow * nits

…IterableDataset. Issue 29678 (huggingface#29738) * Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option. * make fixup

* Add create token type ids to CodeGenTokenizer * Fix inconsistent length of token type ids * Format source codes * Fix inconsistent order of methods * Update docstring * add test_tokenizer_integration test * Format source codes * Add `copied from` comment to CodeGenTokenizerFast * Add doc of create_token_type_ids_from_sequences * Make return_token_type_ids False by default * Make test_tokenizer_integration as slow test * Add return_token_type_ids to tokenizer init arg * Add test for tokenizer's init return_token_type_ids * Format source codes

* Add evaluation loop container for interm. results * Add tests for EvalLoopContainer * Formatting * Fix padding_index in test and typo * Move EvalLoopContainer to pr_utils to avoid additional imports * Fix `eval_do_concat_batches` arg description * Fix EvalLoopContainer import

* [DO NOT MERGE] Testing tokenizers 0.19.0rc0 * Accounting for the breaking change. * Ruff. * Upgrading to tokenizers `0.19` (new release with preprend_scheme fixed and new surface for BPE tiktoken bug).

* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit 4df56a4. * Revert "OLMo 7B Twin 2T integration test fix" This reverts commit 9ff65a4. * Ungate 7B integration tests and fix greedy generation test * Add retries for flaky test_eager_matches_sdpa_generate * Fix output of doc example for OLMoForCausalLM.forward * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model * Try fix incorrect characters in OLMoForCausalLM.forward doct test * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes * Remove pretraining_tp from OLMo config and model * Add missing 'Copied from' instances * Remove unneeded causal_mask from OLMoModel * Revert Llama changes * Ignore copy for OLMoForCausalLM.forward * Change 'OLMo' to 'Olmo' in classes * Move minimal OLMo tokenization tests to model tests * Add missed 'Copied from' for repeat_kv

* tentatively re-enable FA2 + SDPA * better comment * _ignore_causal_mask_sdpa as staticmethod * type hints * use past_seen_tokens instead * enable copied from for sdpa * ruff * llama simplifications on review * remove unnecessary self.is_causal check * fix copies * cleaning * precise message * better doc * add test * simplify * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Arthur <[email protected]> * style --------- Co-authored-by: Arthur <[email protected]>

fix olmo

* Added flash attention 2. * Fixes. * Fix inheritance. * Fixed init. * Remove stuff. * Added documentation. * Add FA2 to M2M100 documentation. * Add test. * Fixed documentation. * Update src/transformers/models/m2m_100/modeling_m2m_100.py Co-authored-by: Younes Belkada <[email protected]> * Update docs/source/en/model_doc/nllb.md Co-authored-by: amyeroberts <[email protected]> * Fixed variable name. --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* Fix multiline processing * Update test for token2json

* fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* Add jamba arch * apply "make fix-copies" changes * fix link to model in JambaConfig docstring * Add n_ctx in modeling file because repo-consistency wants that * Add jamba to flash attention and sdpa documentation * mamba dt_proj quant fix now works for LoRA as well * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers * add jamba to tokenization auto * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24) * simple PR fixes * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (huggingface/peft#1530) * Add copied comment on JambaMLP (it's the same as MixtralMLP) * remove padding_mask warnings. It's not supported anymore * fix docstring. Float instead of int * A few more minor PR fixes * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass * Return None attention weights from mamba layers. Append to all attentions only if not None. * remove some leftover jamba archive lists * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers * Add Jamba paper on READMEs * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes) * Add copied from comment * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms * clearer docstring for _convert_to_standard_cache * style fixes * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs * rename test so it still overrides what its meant to override * draft * oups * nit * remove more complexe logic * fix names used in config * fix fix fix * style * fix some more failing tests * generate did not init the cache 🙃 * more small nits * typo * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes * fix init of pkv with torch.tensor() * empty tensor * fix some init issues * stupid changes required by generate because it does not even support it's own DynamicCache class * more fixes * fix general assisted gen cache_position bug * tests passing * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore * fix docstrings and typehints for past_key_values * style fixes * fix docs * change typehint due to copy from Mixtral * forgot import * import order * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward) * Add integration test with tiny tandom Jamba model on hub * fix flash attention cache shapes * bring back forgotten hidden states * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model * align integration test after modeling fixes * bugfix - mamba can use precomputed states only of forward pass is on a single token * bugfix - mamba can use precomputed states only if they match the batch size * typo * remove making _prepare_4d_causal_attention_mask a leaf function * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly --------- Co-authored-by: Arthur Zucker <[email protected]> Co-authored-by: Joao Gante <[email protected]>

atol for sliding window test

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

…0314) * Revert "Re-enable SDPA's FA2 path (huggingface#30070)" This reverts commit 05bdef1. * Revert "Fix quality Olmo + SDPA (huggingface#30302)" This reverts commit ec92f98.

* overlooked * style * cleaner

* wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <[email protected]> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <[email protected]> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <[email protected]> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <[email protected]> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <[email protected]> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <[email protected]> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <[email protected]> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <[email protected]> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <[email protected]> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <[email protected]> * add useage example * put on one line Co-authored-by: amyeroberts <[email protected]> * fix ffn_act_fn Co-authored-by: amyeroberts <[email protected]> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Eitan Turok <[email protected]> Co-authored-by: Eitan Turok <[email protected]> Co-authored-by: Wing Lian <[email protected]> Co-authored-by: Eitan Turok <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Your Name <[email protected]> Co-authored-by: Mihir Patel <[email protected]> Co-authored-by: amyeroberts <[email protected]>

… + revert huggingface#30070 at the same time (huggingface#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

amathews-amd · 2024-05-29T20:38:05Z

cc: @AdrianAbeyta

@Cemberk , please run huggingface bert, bart, roberta, deberata, distilbert, gpt2, gpt-neo DLM models (training performance only), before and after this PR and post delta.

lcskrishna · 2024-05-30T10:09:06Z

Hi @Cemberk can you make sure the following PR makes into the IFU.
huggingface#31016

peterjc123 and others added 30 commits March 20, 2024 11:06

fix galore layerwise with frozen params (huggingface#29743)

a1a7454

[Tests] Remove unused code (huggingface#29737)

776c9d3

Remove unused code

fix jinja2 package version check (huggingface#29754)

870bbb4

v4.40.0.dev.0

1248f09

SuperPointModel -> SuperPointForKeypointDetection (huggingface#29757)

3c17c52

Update test reqs to include sentencepiece (huggingface#29756)

c78f577

* Update test reqs * Clean

Fix docker image build (huggingface#29762)

17e4467

update Co-authored-by: ydshieh <[email protected]>

[docs] Remove redundant - and the from custom_tools.md (huggingfa…

5d1a58a

…ce#29767) [docs] Remove redundant and from custom_tools.md

Fixed typo in quantization_config.py (huggingface#29766)

0639034

Update quantization_config.py Fixed typo for clarity and correctness. previous: input time current: input type // changed time to type to fix the typo

fix issue with logit processor during beam search in Flax (huggingfac…

fd734be

…e#29636) fix issue with logit processor in beam search in Flax

Fix docker image build for Latest PyTorch + TensorFlow [dev] (huggi…

2ddceef

…ngface#29764) * update * update --------- Co-authored-by: ydshieh <[email protected]>

[LlavaNext] Fix llava next unsafe imports (huggingface#29773)

73a73b4

* path llava-next * styling * styling

Cast bfloat16 to float32 for Numpy conversions (huggingface#29755)

de627f5

* Cast bfloat16 to float32 for Numpy conversions * Add test

Silence deprecations and use the DataLoaderConfig (huggingface#29779)

f0bfb15

* Remove deprecations * Clean

Add deterministic config to set_seed (huggingface#29778)

10d232e

* Add deterministic config * Add note on slowdown * English fails me again

Add support for torch_dtype in the run_mlm example (huggingface#29776)

ef6e371

feat: add support for torch_dtype Co-authored-by: Jacky Lee <[email protected]>

Generate: remove legacy generation mixin imports (huggingface#29782)

5ffef2a

Llama: always convert the causal mask in the SDPA code path (huggingf…

ee38fc3

…ace#29663) * always convert the mask * rebase and fix copies

[quality] update quality check to make sure we check imports 😈 (hug…

e68ff30

…gingface#29771) * update quality check * make it nice * update * let's make sure it runs and we have the logs actually * update workflow * nits

Fix type hint for train_dataset param of Trainer.__init__() to allow …

3479161

…IterableDataset. Issue 29678 (huggingface#29738) * Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option. * make fixup

st81 and others added 23 commits April 17, 2024 12:19

Upgrading to tokenizers 0.19.0 (huggingface#30289)

8e5f76f

* [DO NOT MERGE] Testing tokenizers 0.19.0rc0 * Accounting for the breaking change. * Ruff. * Upgrading to tokenizers `0.19` (new release with preprend_scheme fixed and new surface for BPE tiktoken bug).

Fix quality Olmo + SDPA (huggingface#30302)

ec92f98

fix olmo

Fix donut token2json multiline (huggingface#30300)

7915a25

* Fix multiline processing * Update test for token2json

Fix all torch pipeline failures except one (huggingface#30290)

28a2283

* fix * fix --------- Co-authored-by: ydshieh <[email protected]>

Add atol for sliding window test (huggingface#30303)

9459efb

atol for sliding window test

Fix RecurrentGemma device_map (huggingface#30273)

7509a0a

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

Revert "Re-enable SDPA's FA2 path (huggingface#30070)" (huggingface#3…

acab997

…0314) * Revert "Re-enable SDPA's FA2 path (huggingface#30070)" This reverts commit 05bdef1. * Revert "Fix quality Olmo + SDPA (huggingface#30302)" This reverts commit ec92f98.

Do not drop mask with SDPA for more cases (huggingface#30311)

63c5e27

* overlooked * style * cleaner

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules…

5728b5a

… + revert huggingface#30070 at the same time (huggingface#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

Release: v4.40.0

745bbfe

Make EosTokenCriteria compatible with mps (huggingface#30376)

f8fec6b

v4.40.1

9fe3f58

Fix for Neuron (huggingface#30259)

bb98e7c

Fix copies for DBRX - neuron fix (huggingface#30610)

6530a98

v4.40.2

4fdf58a

upstream v4.40 IFU

35458a2

Cemberk requested review from AdrianAbeyta, amathews-amd, lcskrishna and gargrahul May 29, 2024 18:12

amathews-amd requested a review from yury-amd May 29, 2024 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V4.40 release IFU #36

V4.40 release IFU #36

Cemberk commented May 29, 2024

amathews-amd commented May 29, 2024

lcskrishna commented May 30, 2024

V4.40 release IFU #36

Are you sure you want to change the base?

V4.40 release IFU #36

Conversation

Cemberk commented May 29, 2024

amathews-amd commented May 29, 2024

lcskrishna commented May 30, 2024