Skip to content

feat: support Nemotron using dual prompt strategy #1165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

Pouyanpi
Copy link
Collaborator

@Pouyanpi Pouyanpi commented May 1, 2025

Nemotron Dual Mode requires Dual-Prompt Strategy: Reasoning Mode vs Standard Mode

Overview

This PR implements a dual-prompt strategy for Nemotron models in NeMo-Guardrails, allowing users to control the level of reasoning in model outputs by switching between reasoning mode and standard mode.

Key Changes

  • Created nemotron_reasoning.yml with message-based prompts containing "detailed thinking on" system messages
  • Created nemotron_standard.yml with message-based prompts without detailed thinking
  • Added comprehensive tests to verify prompt selection behavior
  • Added documentation in both prompt directory and example configs

Why

Different tasks benefit from different reasoning approaches:

  • Complex problem-solving and step-by-step analysis benefit from detailed reasoning
  • Simple conversational tasks are more efficient with direct responses
  • Users need the flexibility to choose based on their specific use case

Implementation Details

  • Reasoning Mode: When prompting_mode: reasoning is set, uses prompts with detailed thinking

    • Tasks like generate_bot_message and generate_value have two system messages
    • Tasks like generate_user_intent and generate_next_steps have simpler format
  • Standard Mode: When using any other mode, uses prompts without detailed thinking

    • Still uses message-based format (not content-based)
    • More concise, direct responses without extensive reasoning
    • Default behavior if no prompting_mode is specified

How to Test

  1. Use the example configs in examples/configs/nemotron/:

    • reasoning-mode - activates detailed thinking
    • normal-mode - uses standard approach without detailed thinking
  2. Run the test suite:

    pytest tests/test_nemotron_prompt_modes.py -v

Example Usage

# For reasoning mode
config = RailsConfig.from_path("examples/configs/nemotron/reasoning-mode")
rails = LLMRails(config)

# For standard mode
config = RailsConfig.from_path("examples/configs/nemotron/normal-mode")
rails = LLMRails(config)

Documentation

A minimal README.md with usage examples and implementation details

@Pouyanpi Pouyanpi force-pushed the feat/nemotron-support branch from 1c03446 to fa3419b Compare May 14, 2025 16:32
@codecov-commenter
Copy link

codecov-commenter commented May 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.64%. Comparing base (36d625e) to head (03c9f8b).
Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1165      +/-   ##
===========================================
+ Coverage    68.43%   68.64%   +0.20%     
===========================================
  Files          161      161              
  Lines        15943    15970      +27     
===========================================
+ Hits         10910    10962      +52     
+ Misses        5033     5008      -25     
Flag Coverage Δ
python 68.64% <100.00%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/rails/llm/llmrails.py 87.21% <100.00%> (+0.09%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pouyanpi Pouyanpi requested review from trebedea and cparisien May 14, 2025 17:04
@Pouyanpi Pouyanpi marked this pull request as ready for review May 14, 2025 17:04
@Pouyanpi
Copy link
Collaborator Author

@mikemckiernan

Nemotron (and its family) is a hybrid reasoning model. the user can enable reasoning mode using specific system message:

        {
            "role": "system",
            "content": "detailed thinking on",
        }

The question is how to enable reasoning mode for all the tasks involved in guardrails.

  • the user should set prompting_mode: "reasoning" to config.yml,

without this, Nemotron is in normal mode without any reasoning traces. Please have a look at the example configs.

The user can use Nemotron in normal model (withtout prompting_mode set to "reasoning) but still able to do something lik:

rails.generate(
    messages=[
        {
            "role": "system",
            "content": "detailed thinking on",
        },
        {
            "role": "user",
            "content": "what can you do?",
        },
    ]
)

Expected output

{"role": "assistant",
 "content": "<think>\nOkay, let's see. The user just asked "what can you do?" again. The previous conversation shows that the bot already explained its capabilities once. The user's intent here is probably asking for more details or a repetition. But since the bot has already provided the capabilities response, maybe the bot should ask how it can assist further instead of repeating the same information.\n\nLooking at the bot intents, there's an example where the bot uses "ask how can i assist you further". So the correct response would be to prompt the user for more specific help rather than just repeating the capabilities. That makes sense because the user might be looking for examples or needs to be guided on how to ask for something specific. The bot should be proactive in offering assistance. Therefore, the appropriate bot message here is "How can I assist you further?".\n</think>How can I assist you further?"}

@Pouyanpi Pouyanpi self-assigned this May 14, 2025
@Pouyanpi Pouyanpi added this to the v0.14.0 milestone May 14, 2025
Copy link
Collaborator

@cparisien cparisien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good. Let's get a documentation update happening ASAP.

@Pouyanpi Pouyanpi added the enhancement New feature or request label May 14, 2025
Copy link
Collaborator

@trebedea trebedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, we just need to update the docs to mention the nemotron models that support detailed reasoning.

Pouyanpi added 2 commits May 15, 2025 11:46
update

readme

fix

update
engine: nim
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
reasoning_config:
remove_reasoning_traces: True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this, do we?

Copy link

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1165

@Pouyanpi Pouyanpi requested a review from trebedea May 15, 2025 11:44
@Pouyanpi Pouyanpi changed the title feat: add nemotron model to all tasks feat: support Nemotron using dual prompt strategy May 15, 2025
@Pouyanpi
Copy link
Collaborator Author

close for future

@Pouyanpi Pouyanpi closed this May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants