Skip to content

Conversation

KutalVolkan
Copy link
Contributor

@KutalVolkan KutalVolkan commented May 29, 2025

Thank you @eugeniavkim for working with me on the concept and design of this agentic multi-agent red teaming pipeline!

Please review when you have a chance. Feedback, suggestions for further modularity, or requests for sample agent system prompts are welcome :)


Overview

This draft PR introduces a new flexible Multi-Agent-System (MAS) pipeline for red teaming LLMs.
The MASChatTarget class enables composing any number of agents (e.g., recon, strategy, red-team, etc.) into an ordered chain, each with its own system prompt and context.

Key features:

  • Generic agent_chain architecture allows 2, 3, or more agents in any order.
  • Agents receive full conversation history plus optional per-role context.
  • Currently no support for tool/function calls pipeline, it is prompt-based only.
  • Example orchestration (in use_msa_chat_target.py) demonstrates usage with strategy and red-team agent, and can be extended to include recon, or other agent roles.

Visualization of the MASChatTarget agent pipeline.
image
Note: The class name is now MASChatTarget (previously MoAChatTarget).

@KutalVolkan KutalVolkan marked this pull request as ready for review May 29, 2025 16:55
@KutalVolkan
Copy link
Contributor Author

Hello Roman, hey Eugenia :)

I need to check the memory in DuckDB. I’m a bit unsure about when adversarial_chat is considered an user and when it is considered an assistant in the context of the RedTeamingOrchestrator.

@romanlutz
Copy link
Contributor

romanlutz commented May 31, 2025

In the current setup, PyRIT is always the user and any target has the assistant role. There's one conversation with adversarial_chat, one with objective_target, and one with the scoring_target (assuming LLM scorers).

In your case this gets complicated because it's not just a single message and response but a chain. We may have to introduce a new role or just call all of the agents "assistant" and differentiate in a different way. Is it fair to assume that it's like this:

User -> agent 1 -> agent 2 -> ... -> agent n -> user

Or does user talk to agent 1, agent 1 to agent 2, etc and it's actually just n-1 separate conversations of 2 participants? In the latter case, it makes sense to call all of them separate conversations with user/assistant roles. In the former case I'm really not sure.

@rlundeen2 or @bashirpartovi may have thoughts.

@KutalVolkan
Copy link
Contributor Author

KutalVolkan commented May 31, 2025

In the current setup, PyRIT is always the user and any target has the assistant role. There's one conversation with adversarial_chat, one with objective_target, and one with the scoring_target (assuming LLM scorers).

In your case this gets complicated because it's not just a single message and response but a chain. We may have to introduce a new role or just call all of the agents "assistant" and differentiate in a different way. Is it fair to assume that it's like this:

User -> agent 1 -> agent 2 -> ... -> agent n -> user

Or does user talk to agent 1, agent 1 to agent 2, etc and it's actually just n-1 separate conversations of 2 participants? In the latter case, it makes sense to call all of them separate conversations with user/assistant roles. In the former case I'm really not sure.

@rlundeen2 or @bashirpartovi may have thoughts.

Thanks, Roman.

Right now, our MAS pipeline implements a single linear chain where user input is passed through each agent in sequence. For PyRIT compatibility, all agents are currently marked as "assistant" (except the initial and target responses, which are "user") when persisted to memory. We track the true MAS agent roles only in our internal _history, not in prompt_metadata. I could add each MAS agent’s role to prompt_metadata in the stored PromptRequestPiece. Would that be a good fit for auditability and downstream analysis?

@romanlutz
Copy link
Contributor

Potentially. I am curious if @rlundeen2 has thoughts.

@KutalVolkan
Copy link
Contributor Author

Hi @romanlutz @eugeniavkim @rlundeen2,

Quick note: Our multi-agent orchestrator currently only supports prompt-based chaining, no dynamic tools or data fetches yet (all logic is purely handled through prompt engineering and agent system YAMLs). If, say, a recon agent could trigger real actions (like a web search) and pass the results downstream, we’d get much more adaptive and realistic attack flows.

One idea: agent emits an action keyword, the orchestrator intercepts that, runs the tool, and injects the results back into the agent chain. Is this kind of dynamic action pipeline already on your radar, or do you have any early design thoughts? I’ll spend more time thinking about implementation options, but probably not until next week.

@KutalVolkan KutalVolkan changed the title [DRAFT] FEAT: Add MASChatTarget Generic Multi-Agent Pipeline for Red Teaming FEAT: Add MASChatTarget Generic Multi-Agent Pipeline for Red Teaming Jul 14, 2025
@KutalVolkan KutalVolkan changed the title FEAT: Add MASChatTarget Generic Multi-Agent Pipeline for Red Teaming FEAT: Add Multi-Agent Orchestrator Generic Multi-Agent Pipeline for Red Teaming Jul 14, 2025
@KutalVolkan
Copy link
Contributor Author

KutalVolkan commented Jul 15, 2025

Update: Exploring dynamic tool use

Agents can emit an action keyword (e.g., "action": "web_search"), the orchestrator detects this, runs the corresponding tool (Python function), and injects results into the agent context for downstream agents. For quick PoCs (outside PyRIT), I usually call the OpenAI function-calling API for simple web search actions, or sometimes just use Python’s requests plus BeautifulSoup to scrape and parse web content and let the LLM process that. This works for fast demos, but for longer-term, production-grade features, using vendor APIs may be more robust.

If anyone has a preferred pattern for tool execution, or opinions on whether to always use LLM-native tool calls or custom Python logic, let me know :)

Current Approach:
Related Issue and pointers from @romanlutz: #1006

@KutalVolkan
Copy link
Contributor Author

I'm currently working on refactoring the multi-agent orchestrator to align with the new MultiTurnAttackStrategy interface. Expected completion: 29.09.2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants