Skip to content

Phi4 Agent demo for AIPC #2961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions supplementary_materials/phi4-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Phi-4-mini Agent on AI-PC with OpenVINO™ & 🤗 smolagents
In this notebook we will guide you on how to deploy a [Phi-4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) agent equipped with tools and MCP to run fully locally on your Intel<sup>®</sup> Core™ Ultra laptop with [OpenVINO™](https://docs.openvino.ai/2025/index.html) and [🤗 smolagents](https://github.com/huggingface/smolagents).

An LLM based agent is a program where the LLM control the entire workflow of the program. In this notebook we build a multi-step agent. Given a task, the LLM will decide which action to take in each step and when to terminate and return a final answer to the user.

We will demonstrate how the agent with the help of a few simple tools, like web-search and a code-completion tool (Also based on Phi-4-mini), the agent is able to code new tools and then use those tools summarize information into a PPT presentation for example.

Furthermore, we will demonstrate how with the use of a YouTube transcript MCP we can chat with YouTube videos without worrying about the prompt exploding to thousands of tokens and everything is done locally with speed.

## Prerequisites
Create a new Python environment for this notebook and activate it and make sure you have `git` installed. E.g.
```cmd
conda create -n phi4-demo python=3.11
conda activate phi4-demo
```

Then, lets start by installing all the packages we will need
```cmd
pip install -r requirements.txt
```

For this notebook we used a modified version of smolagents so we will need to clone the repository, apply our patch and install from source.
```
git clone https://github.com/huggingface/smolagents && cd smolagents
git checkout v1.14.0 && git apply ..\phi4_smolagents.patch
pip install .[mcp]
cd ..
```

> [!NOTE]
> To have the most up-to-date experience with OpenVINO you can also install the nightly versions:
> ```cmd
> pip install --pre -U openvino --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
> pip install git+https://github.com/huggingface/optimum.git
> pip install git+https://github.com/huggingface/optimum-intel.git
> ```

## Prepare Phi-4-mini model for inference
We will use `optimum-cli` to convert and quantize Phi-4-mini model from the HuggingFace Hub to OpenVINO format
```cmd
optimum-cli export openvino --model microsoft/Phi-4-mini-instruct --task text-generation-with-past --weight-format int8 phi-4-mini-instruct-int8-ov
```

## Demo
Now we are ready to run our demo:
```cmd
python phi4_agent.py
```
62 changes: 62 additions & 0 deletions supplementary_materials/phi4-agent/phi4_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import argparse

from tools import CodeCompletionTool, YoutubeTranscriptRetriever, build_presentation
from mcp import StdioServerParameters

from smolagents import TransformersModel, ToolCallingAgent, GradioUI
from smolagents.mcp_client import MCPClient


def add_arguments(parser):
parser.add_argument(
"--model_path",
type=str,
default="phi-4-mini-instruct-int8-ov",
help="Path to the openvino model directory.",
)
parser.add_argument(
"--backend",
type=str,
default="ov-optimum",
choices=["ov-optimum", "torch"],
help="Backend to use for the model.",
)
return parser


def main(model_path, backend="ov-optimum"):
# Initialize the Phi-4-mini model
model = TransformersModel(
model_id=model_path, max_new_tokens=1024, backend=backend)


server_params = StdioServerParameters(
command="python",
args=["-m", "duckduckgo_mcp_server.server"]
)
tools = [build_presentation, CodeCompletionTool(model.model, model.tokenizer, max_new_tokens=1024)]

# We initialize the MCP in a try block to ensure we disconnect from the MCP servers if there's an error
try:
# Start the MCP server
mcp_client = MCPClient(server_params)
tools.extend(mcp_client.get_tools())
# Youtube tool also uses an MCP server
yt_transcript_retriever = YoutubeTranscriptRetriever(device='GPU')
tools.append(yt_transcript_retriever)

# After initializing the tools, we can initialize our agent
agent = ToolCallingAgent(tools=tools, model=model, add_base_tools=False, max_steps=5)

# Now we can start our gradio demo, you can also run a single task with agent via agent.run()
GradioUI(agent).launch(inbrowser=True, inline=False)
finally:
mcp_client.disconnect()
yt_transcript_retriever.disconnect()


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the Phi-4 agent.")
parser = add_arguments(parser)
args = parser.parse_args()
main(**vars(args))
Loading
Loading