Add Agentic Reinforcement Learning (RL) Support to strands-agents #597
sharabhshukla
started this conversation in
Ideas
Replies: 2 comments 1 reply
-
Hi there, |
Beta Was this translation helpful? Give feedback.
1 reply
-
This may be related, #609 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to propose adding agentic reinforcement learning (RL) support to the strands-agents framework. This would allow agents to adapt over time via feedback, optimizing their decision-making and task execution policies based on reward signals — not just static prompting or rule-based control.
By integrating reinforcement learning mechanisms (e.g. PPO, DPO), strands-agents could support training loops where agents learn from interaction, user feedback, or downstream outcomes — especially useful in production AWS environments.
The strands-agents project is a powerful foundation for building autonomous LLM agents that interact with tools, memories, and APIs. However, the current system is prompt-centric and static — agents are programmed via tool definitions and planning logic, but don't yet have the ability to:
Improve themselves via reward-based feedback
Learn from failures or corrections over time
Optimize behaviors for specific long-term objectives
Adding RL-based learning capabilities would unlock a new class of adaptive, continuously improving agents, aligned with real-world goals.
A plug-and-play reinforcement learning interface, compatible with existing AgentController-based agents:
RLTrainer(agent, environment, reward_fn).train()
'd love to hear community thoughts on this
Does agentic RL align with the vision for strands-agents?
Would a module like strands-agents-rl make sense as a first step?
Is there existing internal work on integrating RL-based learning into AWS agent frameworks?
Would the community be interested in collaborating on a prototype?
Beta Was this translation helpful? Give feedback.
All reactions