Skip to content

Conversation

@Lucas-Fernandes-Martins

As noted in the main READ.me, Gemma 3 models are not yet supported by ART, due to Gemma not accepting the enable_prefix_caching parameter.

To solve this issue, I've introduced the following changes on get_model_config.py:

use_gemma_config = config.get("use_gemma_config", False)

if use_gemma_config:
        init_args = InitArgs(
            model_name=base_model,
            max_seq_length=32768,
            load_in_4bit=True,  # False for LoRA 16bit
            fast_inference=True,  # Enable vLLM fast inference
            # vLLM args
            disable_log_stats=False,
            gpu_memory_utilization=(
                0.79 if enable_sleep_mode else 0.55
            ),  # Reduce if out of memory
            max_lora_rank=8,
            use_async=True,
        )
 else:
        init_args = InitArgs(
            model_name=base_model,
            max_seq_length=32768,
            load_in_4bit=True,  # False for LoRA 16bit
            fast_inference=True,  # Enable vLLM fast inference
            # vLLM args
            disable_log_stats=False,
            enable_prefix_caching=True,
            gpu_memory_utilization=(
                0.79 if enable_sleep_mode else 0.55
            ),  # Reduce if out of memory
            max_lora_rank=8,
            use_async=True,
        )

I believe this would solve the problem, as users can then specify the parameter use_gemma_config and avoid enable_prefix_caching to be added to the arg list.

Let me know if this is not correct or require adaptations.

Thank you very much :)

@Lucas-Fernandes-Martins Lucas-Fernandes-Martins changed the title gemma 3 fix Gemma 3 fix Jul 15, 2025
@corbt corbt requested a review from bradhilton July 16, 2025 19:21
@corbt
Copy link
Contributor

corbt commented Jul 16, 2025

Very cool! @bradhilton can you take a look at this one?

@bradhilton
Copy link
Collaborator

@Lucas-Fernandes-Martins have you been able to test this? Does it work?

@Lucas-Fernandes-Martins
Copy link
Author

Hi @corbt and @bradhilton, thank you for your message!

Unfortunately, I spent today doing additional testing and I found something concerning with the solution I proposed.

While solving the enable_prefix_caching issue, another one appears (for some reason I failed to notice this yesterday):

AttributeError: 'Gemma3ForCausalLM' object has no attribute 'vllm_engine'

This seems closely linked to this open issue in Unsloth.

Also, when I try to deactivate vllm altogether, I get:

     63             ctx = zmq.Context(async_ctx)
     64 
---> 65         Which previously had to be::
     66 
     67             ctx = zmq.Context.shadow(async_ctx.underlying)

zmq/backend/cython/context.pyx in zmq.backend.cython.context.Context.__init__()
TypeError: an integer is required

I apologize for opening the pull request so soon, I got carried away that the initial enable_cache_prefix issue was solved. If you feel it is appropriate I'll close the pull request, do more investigation, and try and solve the problem.

I've seen some folks in the community mentioning Gemma 3 would be very useful to have in ART, specially due to its multilingual capabilities, so I'll do my best to try and solve this.

Either way, thank you for the help :)

@bradhilton
Copy link
Collaborator

Thank you @Lucas-Fernandes-Martins for your investigation. I am afraid that adding Gemma 3 support will likely be tricky.

@corbt
Copy link
Contributor

corbt commented Jul 17, 2025

@Lucas-Fernandes-Martins it would be great to get Gemma 3 in! Definitely update this PR if you get to a working solution.

@Lucas-Fernandes-Martins
Copy link
Author

Hi, thank you for your patience. After a few days of investigation, it seems that the main issue is that Usloth's Gemma 3 doesn't support VLLM. However, I got some news from the Unsloth community that VLLM support for Gemma 3 will soon be released (maybe next week even).

Once this happens, I'll test ART to see if it now works and keep you folks in the loop!

Thanks again :)

@bradhilton
Copy link
Collaborator

Thank you @Lucas-Fernandes-Martins for investigating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants