Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency transformers to v4.44.2 #164

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Jun 27, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
transformers ==4.41.2 -> ==4.44.2 age adoption passing confidence

Release Notes

huggingface/transformers (transformers)

v4.44.2

Compare Source

Patch release v4.44.2, mostly 2 regressions that were not caught for Jamba and for processors!

v4.44.1: Patch release v4.44.1

Compare Source

Here are the different fixes, mostly Gemma2 context length, nits here and there, and generation issues

Full Changelog: huggingface/transformers@v4.44.0...v4.44.1

v4.44.0

Compare Source

Release v4.44.0: End to end compile generation!!! Gemma2 (with assisted decoding), Codestral (Mistral for code), Nemotron, Efficient SFT training, CPU Offloaded KVCache, torch export for static cache

This release comes a bit early in our cycle because we wanted to ship important and requested models along with improved performances for everyone!

All of these are included with examples in the awesome https://github.com/huggingface/local-gemma repository! 🎈 We tried to share examples of what is now possible with all the shipped features! Kudos to @​gante, @​sanchit-gandhi and @​xenova

💥 End-to-end generation compile

Generate: end-to-end compilation #​30788 by @​gante: model.generate now supports compiling! There are a few limitations, but here is a small snippet:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import copy

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B", torch_dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

### compile generate
compiled_generate = torch.compile(model.generate, fullgraph=True, mode="reduce-overhead")

### compiled generate does NOT accept parameterization except a) model inputs b) a generation config
generation_config = copy.deepcopy(model.generation_config)
generation_config.pad_token_id = model.config.eos_token_id

model_inputs = tokenizer(["Write a poem about the market crashing in summer"], return_tensors="pt")
model_inputs = model_inputs.to(model.device)
output_compiled = compiled_generate(**model_inputs, generation_config=generation_config)
print(output_compiled)

⚡ 3 to 5x compile speedup (compilation time 👀 not runtime)

  • 3-5x faster torch.compile forward compilation for autoregressive decoder models #​32227* by @​fxmarty .
    As documented on the PR, this makes the whole generation a lot faster when you re-use the cache!
    You can see this when you run model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)

🪶 Offloaded KV cache: offload the cache to CPU when you are GPU poooooor 🚀

  • Offloaded KV Cache #​31325* by @​n17s : you just have to set cache_implementation="offloaded" when calling from_pretrained or using this:
from transformers import GenerationConfig
gen_config = GenerationConfig(cache_implementation="offloaded", # other generation options such as num_beams=4,num_beam_groups=2,num_return_sequences=4,diversity_penalty=1.0,max_new_tokens=50,early_stopping=True)
outputs = model.generate(inputs["input_ids"],generation_config=gen_config)

📦 Torch export for static cache

pytorch team gave us a great gift: you can now use torch.export directly compatible with Executorch! Find examples here.

This also unlocks support for prompt reuse:

import os, torch, copy
from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache
device = "cuda"
ckpt = "meta-llama/Meta-Llama-3.1-8B-Instruct"

INITIAL_PROMPT = "From now on, you are going to answer all my questions with historical details. Make sure to always add a bit of french here and there, for style."

model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(ckpt)

prompt_cache = DynamicCache()
inputs = tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
prompt_cache = model(**inputs, past_key_values = prompt_cache).past_key_values

prompt = "Why are french people obsessed with french?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
past_key_values = copy.deepcopy(prompt_cache)
outputs = model.generate(**new_inputs, past_key_values=past_key_values,max_new_tokens=20) 
response = tokenizer.batch_decode(outputs)[0]
print(response)

prompt = "What is the best city to swim in?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**new_inputs, past_key_values=copy.deepcopy(prompt_cache),max_new_tokens=20) 
response = tokenizer.batch_decode(outputs)[0]

Gemma2: assisted decoding

Gemma 2: support assisted generation #​32357 by @​gante

We now have a 2B Gemma 2 model -- a perfect sidekick for the 27B with assisted generation. We've enabled assisted generation in gemma 2, with a caveat: assisted generation currently requires the use of a windowless cache (as opposed to the default cache for gemma 2), so you might observe some output mismatch on long sequences. Read more about it here.

### transformers assisted generation reference: 
### https://huggingface.co/docs/transformers/main/en/llm_optims#speculative-decoding 
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

### we DON’T recommend using the 9b model with the 2b model as its assistant
assistant_model_name = 'google/gemma-2-2b-it'
reference_model_name = 'google/gemma-2-27b-it'

tokenizer = AutoTokenizer.from_pretrained(reference_model_name)
model = AutoModelForCausalLM.from_pretrained(
   reference_model_name, device_map='auto', torch_dtype=torch.bfloat16
)
assistant_model = AutoModelForCausalLM.from_pretrained(
   assistant_model_name, device_map='auto', torch_dtype=torch.bfloat16
)

model_inputs = tokenizer("Einstein's theory of relativity states", return_tensors="pt").to(model.device)
generation_options = {
   "assistant_model": assistant_model,
   "do_sample": True,
   "temperature": 0.7,
   "max_new_tokens": 64,
}

outputs = model.generate(**model_inputs, **generation_options)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Nemotron support

image

Nemotron-4-340B-Instruct is a large language model (LLM) that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs. It is a fine-tuned version of the Nemotron-4-340B-Base model, optimized for English-based single and multi-turn chat use-cases. It supports a context length of 4,096 tokens.

The conversion script should be able to cover Minitron and Nemotron, thanks and kudos to @​suiyoubi. See:

Codestral support

image

Codestral is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash. It also performs well on more specific ones like Swift and Fortran. This broad language base ensures Codestral can assist developers in various coding environments and projects.

Codestral saves developers time and effort: it can complete coding functions, write tests, and complete any partial code using a fill-in-the-middle mechanism. Interacting with Codestral will help level up the developer’s coding game and reduce the risk of errors and bugs.

It's mamba2 architecture, was a bit of a pain to remove all einops but hope we made it better for everyone!

Breaking changes:

We removed the chat template in the code, they should all be on the hub!

Long-form decoding for whisper, even faster:

Our great @​sanchit-gandhi worked on porting the recent compile upgrades to long form decoding in

  • [whisper] compile compatibility with long-form decoding #​31772

What's Changed

New Contributors

Full Changelog: huggingface/transformers@v4.43.4...v4.44.0

v4.43.4: Patch Release

Compare Source

Patch Release v4.43.4

There was a mick mack, now deepseep issues are properly pushed with:

🤗 Enjoy holidays

v4.43.3: Patch deepspeed

Compare Source

Patch release v4.43.3:
We still saw some bugs so @​zucchini-nlp added:

Other fixes:

  • [whisper] fix short-form output type #​32178, by @​sanchit-gandhi which fixes the short audio temperature fallback!
  • [BigBird Pegasus] set _supports_param_buffer_assignment to False #​32222 by @​kashif, mostly related to the new super fast init, some models have to get this set to False. If you see a weird behavior look for that 😉

v4.43.2: : Patch release

Compare Source

  • Fix float8_e4m3fn in modeling_utils (#​32193)
  • Fix resize embedding with Deepspeed (#​32192)
  • let's not warn when someone is running a forward (#​32176)
  • RoPE: relaxed rope validation (#​32182)

v4.43.1: : Patch release

Compare Source

v4.43.0: : Llama 3.1, Chameleon, ZoeDepth, Hiera

Compare Source

Llama

The Llama 3.1 models are released by Meta and come in three flavours: 8B, 70B, and 405B.

To get an overview of Llama 3.1, please visit the Hugging Face announcement blog post.

We release a repository of llama recipes to showcase usage for inference, total and partial fine-tuning of the different variants.

image

Chameleon

The Chameleon model was proposed in Chameleon: Mixed-Modal Early-Fusion Foundation Models by META AI Chameleon Team. Chameleon is a Vision-Language Model that use vector quantization to tokenize images which enables the model to generate multimodal output. The model takes images and texts as input, including an interleaved format, and generates textual response.

ZoeDepth

The ZoeDepth model was proposed in ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller. ZoeDepth extends the DPT framework for metric (also called absolute) depth estimation. ZoeDepth is pre-trained on 12 datasets using relative depth and fine-tuned on two domains (NYU and KITTI) using metric depth. A lightweight head is used with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier.

Hiera

Hiera was proposed in Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

The paper introduces “Hiera,” a hierarchical Vision Transformer that simplifies the architecture of modern hierarchical vision transformers by removing unnecessary components without compromising on accuracy or efficiency. Unlike traditional transformers that add complex vision-specific components to improve supervised classification performance, Hiera demonstrates that such additions, often termed “bells-and-whistles,” are not essential for high accuracy. By leveraging a strong visual pretext task (MAE) for pretraining, Hiera retains simplicity and achieves superior accuracy and speed both in inference and training across various image and video recognition tasks. The approach suggests that spatial biases required for vision tasks can be effectively learned through proper pretraining, eliminating the need for added architectural complexity.

Agents

Our ReactAgent has a specific way to return its final output: it calls the tool final_answer, added to the user-defined toolbox upon agent initialization, with the answer as the tool argument. We found that even for a one-shot agent like CodeAgent, using a specific final_answer tools helps the llm_engine find what to return: so we generalized the final_answer tool for all agents.

Now if your code-based agent (like ReactCodeAgent) defines a function at step 1, it will remember the function definition indefinitely. This means your agent can create its own tools for later re-use!

This is a transformative PR: it allows the agent to regularly run a specific step for planning its actions in advance. This gets activated if you set an int for planning_interval upon agent initialization. At step 0, a first plan will be done. At later steps (like steps 3, 6, 9 if you set planning_interval=3 ), this plan will be updated by the agent depending on the history of previous steps. More detail soon!

Notable changes to the codebase

A significant RoPE refactor was done to make it model agnostic and more easily adaptable to any architecture.
It is only applied to Llama for now but will be applied to all models using RoPE over the coming days.

Breaking changes

TextGenerationPipeline and tokenizer kwargs

🚨🚨 This PR changes the code to rely on the tokenizer's defaults when these flags are unset. This means some models using TextGenerationPipeline previously did not add a <bos> by default, which (negatively) impacted their performance. In practice, this is a breaking change.

Example of a script changed as a result of this PR:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it", torch_dtype=torch.bfloat16, device_map="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("Foo bar"))
  • 🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs by @​gante in #​31747

Bugfixes and improvements


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Copy link

codecov bot commented Jun 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.68%. Comparing base (755bf9c) to head (4b394cd).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #164   +/-   ##
=======================================
  Coverage   71.68%   71.68%           
=======================================
  Files          15       15           
  Lines         226      226           
=======================================
  Hits          162      162           
  Misses         64       64           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 400b600 to 8c26689 Compare June 27, 2024 19:28
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.42.0 chore(deps): update dependency transformers to v4.42.1 Jun 27, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 8c26689 to dc2fe5d Compare June 28, 2024 07:44
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.42.1 chore(deps): update dependency transformers to v4.42.2 Jun 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from dc2fe5d to 0dd9007 Compare June 28, 2024 16:03
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.42.2 chore(deps): update dependency transformers to v4.42.3 Jun 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 0dd9007 to 295a997 Compare July 11, 2024 17:34
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.42.3 chore(deps): update dependency transformers to v4.42.4 Jul 11, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 295a997 to 68c5cd4 Compare July 23, 2024 15:19
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.42.4 chore(deps): update dependency transformers to v4.43.0 Jul 23, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 68c5cd4 to 9adc4cb Compare July 23, 2024 19:12
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.43.0 chore(deps): update dependency transformers to v4.43.1 Jul 23, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 9adc4cb to 5399c0f Compare July 24, 2024 16:05
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.43.1 chore(deps): update dependency transformers to v4.43.2 Jul 24, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 5399c0f to 4026344 Compare July 26, 2024 16:09
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.43.2 chore(deps): update dependency transformers to v4.43.3 Jul 26, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 4026344 to e2aa96f Compare August 5, 2024 12:01
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.43.3 chore(deps): update dependency transformers to v4.43.4 Aug 5, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from e2aa96f to ac28889 Compare August 6, 2024 19:37
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.43.4 chore(deps): update dependency transformers to v4.44.0 Aug 6, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from ac28889 to 118e96c Compare August 20, 2024 19:13
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.44.0 chore(deps): update dependency transformers to v4.44.1 Aug 20, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 118e96c to 4b394cd Compare August 22, 2024 18:58
@renovate renovate bot changed the title chore(deps): update dependency transformers to v4.44.1 chore(deps): update dependency transformers to v4.44.2 Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants