LLao1 is a sophisticated, open-source AI reasoning agent designed to tackle complex, multi-step problems. It leverages Large Language Models (LLMs) via Ollama, coupled with powerful tools like code execution, web search, and web page content fetching. The core principle behind LLao1 is transparent, step-by-step reasoning, making it easy to understand the process behind every answer. This agent is not just a black box; it demonstrates a deliberate and auditable thought process.
LLao1 excels at tasks that require:
- Complex reasoning: Breaking down problems into manageable steps.
- Tool usage: Utilizing external tools for enhanced capabilities.
- Multi-modality: Processing text and images in tandem.
- Transparency: Providing clear, explainable reasoning.
- Iterative exploration: Actively consider multiple answers and alternative paths.
- Self-correction: If reasoning is incorrect, the agent revisits previous steps with a different approach to correct it.
- Best-practices: Incorporates best practices in LLM agent development, not just following a list of instructions blindly.
- Step-by-Step Reasoning: LLao1 decomposes complex problems into logical steps, each with a title, content, and a decision about the next action.
- Tool Integration: Supports code execution (
code_executor
), web searching (web_search
), and web page content fetching (fetch_page_content
). - Multi-Modal Support: Can process both text and image inputs, making it suitable for various applications.
- JSON-Based Communication: Leverages JSON for structured communication between reasoning steps and tool interactions.
- Ollama Integration: Seamlessly integrates with local Ollama installations for privacy and speed.
- Streamlit UI: Provides an interactive web interface for easy interaction and visualization of reasoning steps.
- Export Functionality: Allows exporting the entire reasoning process as a JSON file.
- Error Handling: Robust error handling at every stage, including the LLM, tool execution, and image processing.
- Configurable Settings: Allows for user customization of model, thinking tokens, and temperature via the UI and constants.
- Context-Aware: Maintain context across conversations.
- Real-time Updates: Reasoning steps appear dynamically as they are generated.
LLao1's architecture is modular and designed for extensibility:
-
User Interface (
llao1/ui/app.py
): A Streamlit application that provides the user interface for inputting prompts, viewing results, configuring settings, and exporting the reasoning process. -
Reasoning Core (
llao1/core/reasoning.py
): The main logic for generating reasoning steps.- It takes a user prompt, system prompt and previous messages, and generates step-by-step reasoning using the specified LLM.
- It decides when to use available tools and keeps track of previous conversations.
- It uses
make_ollama_api_call
to communicate with the LLM and parses the response.
-
LLM Interface (
llao1/core/llm_interface.py
): Manages communication with the Ollama API.- Handles retries and errors during API calls to ensure reliability.
- Manages JSON-formatted responses from LLM.
-
Tools (
llao1/core/tools.py
): Implements various tools that the LLM can use.execute_code
: Executes Python code in a subprocess.web_search
: Performs web searches using the Exa API (requires an API key).fetch_page_content
: Retrieves web page content based on IDs from web search results.
-
Prompts (
llao1/core/prompts.py
): Defines the system prompt used to guide the LLM's reasoning behavior.- It emphasizes step-by-step explanations and use of tools when necessary.
-
Image Utils (
llao1/models/image_utils.py
): Provides functionality for encoding images into base64 strings, enabling multi-modal input to LLM. -
Configuration (
llao1/utils/config.py
): Stores default values for LLM and reasoning parameters, such as thinking tokens and default model. -
Export (
llao1/utils/export.py
): Provides the functionality to export the reasoning process to JSON format.
- The
generate_reasoning_steps
function inllao1/core/reasoning.py
is the heart of the reasoning engine. - It maintains a conversational context by managing a list of messages (
messages
) between user and assistant. - It handles multi-modal input, encoding images to base64 if needed.
- For each reasoning step, it calls the LLM via
make_ollama_api_call
and parses the response. - It handles tool calls and adds the results to context.
- It uses a JSON-formatted response structure from the LLM, expecting a 'title', 'content', 'next_action', and optionally 'tool', 'tool_input' and 'tool_result' keys.
- It yields the reasoning steps incrementally and also provides total execution time and tokens used.
- Code Execution (
execute_code
): Executes Python code in a sandboxed subprocess usingsubprocess.run
. The subprocess runs withcapture_output=True
to capture stdout and stderr, and with a 5 second timeout usingtimeout=5
. A customPYTHONPATH
is set to allow imports from current working directory. - Web Search (
web_search
): Leverages theexa-py
library to perform searches with highlights. It usesexa.search_and_contents
withtype="auto"
anduse_autoprompt=True
to get accurate results, also returning the ID, title, and text of the search results. Thenum_results
parameter lets the user specify the number of search results. - Page Content Fetching (
fetch_page_content
): Uses theexa-py
library to fetch page contents given a list of ids returned fromweb_search
usingexa.get_contents
withtext=True
, allowing the bot to check the most up to date information. It formats the response with the title and text content of the pages.
- The
make_ollama_api_call
function inllao1/core/llm_interface.py
manages interactions with Ollama usingollama.chat
. - It handles API call retries (3 attempts) using a
for
loop withtime.sleep(1)
between retries to ensure reliability. - It increases token usage by 100 if any tool is called to increase precision.
- It handles JSON decoding with
json.loads
. If the decoding fails, a error message is generated. - The system prompt in
llao1/core/prompts.py
enforces the desired reasoning behavior and also allows the tool usage.
- The UI is built using Streamlit, a Python framework for creating interactive web applications.
- The main application logic is in
llao1/ui/app.py
, where Streamlit's functions for layouts, session state management, and UI elements are used. - Layout: The UI is split into a sidebar (for settings) and a main panel (for the user prompt and reasoning steps).
- Settings: The sidebar provides input fields for configuring LLM settings:
st.number_input
: For adjustingthinking_tokens
, controlling the length of each reasoning step.st.text_input
: For specifying theOllama Model
, allowing users to pick the desired model.st.slider
: For adjusting the LLMtemperature
, changing how random the responses will be.st.file_uploader
: For uploading image files, supporting PNG, JPG, and JPEG formats.
- Session State: Streamlit's session state (
st.session_state
) is used to maintain state across interactions, namelysteps
,error
andmessages
. It is crucial for multi-turn conversations to maintain context. - User Prompt: A
st.text_area
element takes the user's query. - Real-Time Display:
- The reasoning steps are displayed using
display_steps
, dynamically updated as they become available with ast.empty
container. - The steps are displayed using
st.expander
elements to show the title and the full content in a collapsible fashion, making it easy to hide them when desired. - The
display_steps
function inllao1/ui/components.py
is responsible for structuring and formatting each step's content for display. - For each step, if the tool is used, the tool, the input and the result is displayed.
- For the final answer, the title and content are displayed without an expander.
- The thinking time is displayed in each step and also in the sidebar.
- The reasoning steps are displayed using
- Error Handling: A
st.empty
element is used to display errors prominently. The session state is used to keep track of errors to not lose the state in case of an error. - Image Handling:
- The image is saved to a temporal file using
tempfile.NamedTemporaryFile
. - The method
save_image_from_upload
usesPIL
to open and save the file, converting into.jpg
. - After the process is complete, the temporal file is deleted.
- The image is saved to a temporal file using
- Export Functionality: A
st.download_button
allows exporting the data to a JSON file usingllao1.utils.export.export_data
. - Feedback: The UI provides feedback to users with a
st.empty
container for a "Thinking" message while the reasoning engine is working. Once the final answer is generated, this container disappears. - Metrics: The UI calculates the total time and tokens consumed during reasoning, displaying them in the sidebar (
st.sidebar
).
- The
encode_image_base64
inllao1/models/image_utils.py
handles image encoding to base64 for multi-modal LLMs. The method uses thePIL
library to open and save the image to aBytesIO
object in JPEG format and then it is encoded to base64. It handles errors if the file can't be encoded or does not exist.
- The
llao1/utils/config.py
definesDEFAULT_THINKING_TOKENS
andDEFAULT_MODEL
. - The
EXA_API_KEY
environment variable is used to configure the Exa API client for web searching, if the API is available.
- The project incorporates robust error handling at various points, including image processing, tool execution, API calls and JSON decoding.
- Errors are caught using
try/except
blocks and are displayed in the Streamlit UI usingst.error
. - Logs are printed for debugging purposes, giving context to the errors, using Python's print function.
-
Python 3.10+: Ensure you have Python 3.10 or higher installed.
-
Ollama: Install Ollama and have an LLM model downloaded. It is recommended to use
llama3.2-vision
. See Ollama's website for installation instructions. -
Exa API Key (Optional): If you wish to use web search functionality, obtain an API key from Exa and set it as an environment variable
EXA_API_KEY
. -
Pip: Ensure you have pip installed.
-
Clone the Repository:
git clone https://github.com/himudigonda/LLao1 cd LLao1
-
Install Dependencies:
pip install -r requirements.txt
-
Export
EXA_API_KEY
andHUGGINGFACE_API_KEY
.export EXA_API_KEY="thisisafakekeyuseyourkey" export HUGGINGFACE_API_KEY="thisisnotthekeylol"
-
Make
run.sh
executablechmod +x ./run.sh
-
Run
run.sh
:./run.sh
-
Access via Browser: Open your web browser and go to the URL that Streamlit provides.
- Enter your query in the text box.
- (Optional) Upload an image.
- Configure settings (thinking tokens, model name, temperature).
- The reasoning steps will be displayed in real-time using collapsible expander components.
- Use the "Export Steps" button to download the reasoning process as a JSON file.
-
Complex Calculation: "What is the result of (145 * 23) divided by the square root of 256?"
-
Web Research: "What are the most recent news about AI?"
-
Web Research with Specific Results: "Search for Python tutorials about asyncio, give me 3 results"
-
Web Research and Page Content Analysis: "Search for the latest news about autonomous vehicles. Get the page content for the first and second search result."
-
Code Execution: "Execute the following Python code and show the output: print(2**10 + 2048)"
-
Multi-Modal Task: "Describe what is in this image" (with an uploaded image)
Contributions are welcome! Please submit a pull request with your proposed changes.
This project is licensed under the MIT License.