-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent that thinks more thoroughly about question and considers possible outcomes #47
Conversation
WalkthroughThe update enhances the prediction market agents by introducing benchmarking, subquestion handling, and deployment strategies. New tools and utilities improve decision-making by incorporating detailed subquestion analysis and outcome probability management. Changes
Assessment against linked issues
Recent Review DetailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Note that the notebooks have been added for helping the discussion and will not be merged into |
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. This looks like fun to build!
I think this approach is good for improving one of the areas that Martin has mentioned - that the prediction is consistent with predictions for the rest of the probability space.
But I don't think it helps with improving 'depth' of reasoning. I would suspect the prediction for each sub-outcome would perform similarly shallow reasoning (as do our existing agents like the evo agent). Martin was also interested in the idea of getting the agent to reason deeper about the question by generating sub-questions, the answers of which the main question depend on. Like for Will Carlos Alcaraz win the Miami Open by 5 April 2024?, the agent would ask about the probabilities of winning in the quarter final/semi final, win rate of Alcaraz vs the other finalist, etc. and then combine these probabilities in a kind of bayesian way.
Luckily I think these two improvements can be made together, so not saying it should be one or the other. But I think worth thinking about this other type of enhancement at this point.
Also, curious to know, what is the token cost per prediction that you're seeing?
prediction_market_agent/agents/crewai_subsequential_agent/prompts.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
…utcomes # Conflicts: # poetry.lock
I also added a benchmark script to the agent, an excerpt can be found below Comparison ReportMarket Results
Agent ResultsSummary Statistics
Markets
Expected value
|
prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Status
Configuration used: CodeRabbit UI
Files ignored due to path filters (2)
poetry.lock
is excluded by!**/*.lock
,!**/*.lock
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (8)
- crewai_multiple_agent.ipynb (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/prompts.py (1 hunks)
- prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
- prediction_market_agent/agents/known_outcome_agent/deploy.py (2 hunks)
- prediction_market_agent/agents/utils.py (1 hunks)
Files not summarized due to errors (1)
- crewai_multiple_agent.ipynb: Error: Message exceeds token limit
Files skipped from review due to trivial changes (1)
- prediction_market_agent/agents/known_outcome_agent/benchmark.py
Additional comments not posted (30)
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (3)
15-18
: Ensure the model version (gpt-3.5-turbo
) is up-to-date and aligns with the project's requirements for AI models.
44-48
: The methodcalculate_bet_amount
only supports xDai markets. Ensure that this limitation is documented and consider implementing support for additional currencies if required by the project.
51-56
: The main block uses hard-coded values for deployment parameters. Consider externalizing these values to configuration files or environment variables for better maintainability.prediction_market_agent/agents/known_outcome_agent/deploy.py (4)
23-23
: The import ofmarket_is_saturated
fromutils
is a good practice for code reusability. Ensure that the moved function is no longer used within this file to avoid redundancy.
1-1
: The use of# type: ignore
at the top of the file suggests there might be type hinting issues. Ensure that all type hints are correct and consider removing this directive if it's no longer necessary.
1-4
: > 📝 NOTEThis review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [37-89]
The method
answer_binary_market
has a complex logic for determining the market answer. Ensure that this logic is thoroughly tested, especially the error handling and the fallback toNone
when an answer cannot be determined.
1-4
: > 📝 NOTEThis review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [94-137]
The main block for deploying the agent contains hard-coded values and paths. Consider externalizing these to configuration files or environment variables for better maintainability and flexibility.
prediction_market_agent/agents/crewai_subsequential_agent/prompts.py (2)
1-98
: Ensure that the prompts and expected output formats are aligned with the requirements of the CrewAI framework and are correctly formatted for the LLM to understand. Pay special attention to placeholders like[SCENARIO]
and{scenario}
to ensure they are used consistently and correctly.
1-98
: Consider adding more examples to the prompts to cover a wider range of scenarios and improve the LLM's understanding of the task.prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py (2)
1-159
: Ensure that the benchmarking script accurately reflects the performance of the CrewAI agent by verifying the correctness of the market building, prediction generation, and the final assertion on the mean-squared-error forp_yes
.
1-159
: Consider adding documentation or comments explaining the benchmarking process, especially the significance of the mean-squared-error assertion and how the benchmark results should be interpreted.prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py (3)
25-152
: Ensure that theCrewAIAgentSubquestions
class and its methods are well-documented, especially the interaction between tasks, agents, and crews within the CrewAI framework. This will help future developers understand and maintain the code.
25-152
: Consider adding error handling for the CrewAI framework interactions, especially for cases where tasks fail or return unexpected results. This will improve the robustness of the agent's decision-making process.
25-152
: Verify that the asynchronous execution of tasks (async_execution=True
) is correctly managed and that the results are correctly aggregated before making a final decision. This is crucial for the accuracy of the agent's predictions.crewai_multiple_agent.ipynb (16)
4-15
: Imports are correctly organized and necessary for the notebook's functionality.
19-37
: Loading environment variables usingload_dotenv()
is a secure practice for configuration.
40-47
: EnsureSerperDevTool
is effectively utilized in tasks where appropriate.
51-74
: Consider adding comments to explain the purpose and functionality of each agent for clarity, especially if the notebook is intended for educational purposes or wider distribution.
51-115
: Clarify the use of tools in theresearch_task
definition. IfSerperDevTool
or other tools are intended to be used, consider uncommenting and properly integrating them.
126-130
: Consider including other agents (analyst
,writer
) and their respective tasks inreport_crew
if applicable to the simulation's goals, to fully utilize the multi-agent system.
150-152
: Consider enhancing result handling for clarity and context, especially if the notebook is intended for production or broader educational use.
185-230
: Ensure the alternative approach for breaking down scenarios into possible outcomes is consistently integrated with the rest of the notebook's logic and objectives.
286-287
: Consider providing additional examples or explanations to further showcase the alternative approach, especially if the notebook is intended for educational purposes.
426-440
: Ensure that the tools assigned to agents, such assearch_tool
, are utilized effectively in their tasks to fully leverage the capabilities of the multi-agent system.
458-512
: Review the verbose logging level in theCrew
definition to ensure it's appropriate for the intended use case, as it may produce extensive output that could overwhelm users or obscure important information.
865-871
: Enhance the result handling for clarity and context, especially if the notebook is intended for production or broader educational use. Consider using more structured output or visualizations to present the results.
891-896
: Add comments to explain the purpose and functionality of the result handling and condition evaluation for clarity, especially if the notebook is intended for educational purposes or wider distribution.
908-908
: Ensure that the report's conclusions are based on accurate and up-to-date information, especially if the notebook's analysis is used for decision-making or educational purposes.
925-932
: Ensure that the outlined improvements for sentence generation and script execution are implemented systematically and tested thoroughly to enhance the notebook's functionality and accuracy.
940-969
: Consider adding more examples or explanations to further illustrate the approach to analyzing prediction market questions, especially if the notebook is intended for a broad audience.
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Show resolved
Hide resolved
Seems like the benchmark was run only on 1 market, that's not very useful, I guess at least 50 would be nice |
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
…utcomes # Conflicts: # poetry.lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review Status
Configuration used: CodeRabbit UI
Files ignored due to path filters (1)
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (6)
- prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (1 hunks)
- prediction_market_agent/agents/crewai_subsequential_agent/prompts.py (1 hunks)
- prediction_market_agent/agents/known_outcome_agent/deploy.py (2 hunks)
- prediction_market_agent/tools/crewai_tools.py (1 hunks)
Files skipped from review as they are similar to previous changes (5)
- prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
- prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
- prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
- prediction_market_agent/agents/crewai_subsequential_agent/prompts.py
- prediction_market_agent/agents/known_outcome_agent/deploy.py
Additional comments not posted (1)
prediction_market_agent/tools/crewai_tools.py (1)
9-11
: LGTM! TheTavilyDevToolSchema
class correctly uses Pydantic for input validation, ensuring thatsearch_query
is mandatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
Outdated
Show resolved
Hide resolved
Here a few benchmarks for 20 markets and 50 markets 20 markets Comparison ReportMarket Results
Agent ResultsSummary Statistics
MarketsExpected value
50 markets Comparison ReportMarket Results
Agent ResultsSummary Statistics
MarketsExpected value
|
Closes #40
Summary by CodeRabbit
New Features
Refactor
market_is_saturated
function to a utility module for better reusability.