Thinking Agent

Overview • Getting Started • Evaluation • Results • Citation

Overview

AgentThink is a systematic evaluation framework that automatically rates overthinking behavior in large language models. The framework focuses on detecting when models prefer their internal reasoning chain over interacting with the environment, a critical issue in agentic tasks.

The framework evaluates three key aspects of overthinking:

Analysis Paralysis: When models focus on heavy planning instead of interacting with the environment
Rogue Actions: When models generate multiple actions without waiting for environment feedback
Premature Disengagement: When models conclude tasks without proper environment verification

Getting Started

First, clone the repository and install the required packages:

git clone https://github.com/AlexCuadron/AgentThink.git
cd AgentThink
pip install -r requirements.txt

The framework consists of two main components:

format_message.py: Processes and formats interaction logs into a standardized format
analyze_agent_think.py: Analyzes the formatted interactions and produces overthinking scores

Configuration

The framework uses a config.toml file to configure the LLM settings:

[llm]
model = "claude-3-5-sonnet-20241022"
api_key = ""  # Set via environment variable LLM_API_KEY
temperature = 0.0
max_output_tokens = 4096
num_retries = 3
retry_min_wait = 4
retry_max_wait = 10
retry_multiplier = 2

Evaluation

The evaluation process follows these steps:

Data Collection: Gather interaction logs from models performing agentic tasks
Message Formatting: Use format_message.py to standardize the interaction format
Analysis: Run analyze_overthinking.py to evaluate overthinking behaviors
Scoring: Generate scores (0-10) for each interaction based on:
- 0-3: Always interacting with the environment
- 4-7: Sometimes relies on internal reasoning
- 8-10: Completely relies on internal reasoning

Usage

To analyze a set of interactions:

# Load configuration and initialize LLM
config = load_config()
llm = LLM(config)

# Analyze responses
analyze_responses("path/to/interactions", iteration_number=None)

Results

Our framework has been used to analyze 4,018 trajectories from various models performing software engineering tasks. Key findings from our research:

Performance Impact:
- Higher overthinking scores strongly correlate with decreased performance
- Selecting solutions with lower overthinking scores improves model performance by ~30%
- Computational costs can be reduced by 43% through overthinking mitigation
Model Behavior Analysis:
- Reasoning models exhibit stronger tendencies toward overthinking compared to non-reasoning models
- Three main patterns were identified:
  - Analysis Paralysis: Models focus on planning instead of action
  - Rogue Actions: Models execute multiple actions without waiting for feedback
  - Premature Disengagement: Models conclude tasks without proper verification
Mitigation Strategies:
- Native function-calling capabilities can help reduce overthinking
- Selective reinforcement learning shows promise in mitigating overthinking tendencies
- Simple selection of lower overthinking score solutions provides significant improvements

Citation

If you find this work useful, please cite our paper:

@misc{cuadron2025dangeroverthinkingexaminingreasoningaction,
      title={The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks}, 
      author={Alejandro Cuadron and Dacheng Li and Wenjie Ma and Xingyao Wang and Yichuan Wang and Siyuan Zhuang and Shu Liu and Luis Gaspar Schroeder and Tian Xia and Huanzhi Mao and Nicholas Thumiger and Aditya Desai and Ion Stoica and Ana Klimovic and Graham Neubig and Joseph E. Gonzalez},
      year={2025},
      eprint={2502.08235},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.08235}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Models		Models
config		config
llm		llm
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
analysis_results.jsonl		analysis_results.jsonl
analyze_overthinking.py		analyze_overthinking.py
building_litellm.md		building_litellm.md
config.toml		config.toml
format_message.py		format_message.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thinking Agent

Overview

Getting Started

Configuration

Evaluation

Usage

Results

Citation

About

Releases

Packages

Languages

License

AlexCuadron/ThinkingAgent

Folders and files

Latest commit

History

Repository files navigation

Thinking Agent

Overview

Getting Started

Configuration

Evaluation

Usage

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages