Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bench): evaluate review writing in ReviewBench #917

Merged
merged 22 commits into from
Jan 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ research_bench/data/arxiv_ai_papers/output_with_references.json
research_bench/data/arxiv_ai_papers/paper_info.json
research_bench/crossbench/*.json
research_bench/mlbench/*.json
research_bench/iclrbench/*.json
research_bench/profile_dbs/*
research_bench/results/*
research_bench/profile_dbs_old
Expand Down
10 changes: 10 additions & 0 deletions README-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,13 @@ pre-commit install
</picture>
</a>
</p>

## ResearchBench

要执行ResearchBench实验,请运行 'research_bench/run_review_eval.sh' 脚本。你可以在脚本中调整参数,如使用实际的 `INPUT_PATH`。

如果遇到 `openreview` 未找到的错误,请通过运行 `pip install openreview` 安装该包。如果遇到与 `requests` 相关的问题,请将其版本更改为 `2.26`。

```bash
pip install requests==2.26
```
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,13 @@ Check the github action result to make sure all tests pass. If not, fix the erro
</picture>
</a>
</p>

## ResearchBench

To execute ResearchBench experiments, please execute 'research_bench/run_review_eval.sh' script. You can adjust the parameters in the script, using the actual `INPUT_PATH`.

If you encounter `openreview` not found error, please install the package by running `pip install openreview`. If any issues come up regarding `requests`, please change its version to `2.26`.

```bash
pip install requests==2.26
```
58 changes: 0 additions & 58 deletions configs/agent_prompt/write_metareview_decision.yaml

This file was deleted.

49 changes: 0 additions & 49 deletions configs/agent_prompt/write_metareview_ethical.yaml

This file was deleted.

41 changes: 7 additions & 34 deletions configs/agent_prompt/write_metareview_strength.yaml
Original file line number Diff line number Diff line change
@@ -1,44 +1,17 @@
fewshot_examples:
- "Here is the proposal: We present a novel deep learning architecture, TransformerX, for natural language processing tasks. Our model achieves state-of-the-art performance on multiple benchmarks while requiring significantly less computational resources than existing models.

Here are the reviews:
Reviewer 1 (Score: 8/10): The paper presents an innovative approach to efficient NLP modeling. The results are impressive, showing both performance gains and reduced computational requirements. However, the theoretical analysis could be more rigorous.

Reviewer 2 (Score: 9/10): This is a strong paper with clear contributions. The TransformerX architecture is well-designed and the extensive experiments demonstrate its effectiveness. The paper could benefit from more ablation studies.

Here is the summary of the reviews: Both reviewers acknowledge the novelty and effectiveness of the proposed TransformerX architecture, with minor suggestions for improvement.

Please begin writing the strength of the submission based on the review."

- "Strength of the submission: The submission presents a strong, innovative approach to NLP modeling with clear empirical advantages and thorough evaluation, making it a valuable contribution to the field."

- "Here is the proposal: Our paper introduces a novel graph neural network algorithm, GraphFusion, for multi-modal data integration in bioinformatics. We demonstrate its effectiveness in predicting protein-protein interactions and drug-target affinities, outperforming existing methods on several benchmark datasets.

Here are the reviews:
Reviewer 1 (Score: 7/10): The paper presents an interesting approach to multi-modal data integration. The results on protein-protein interaction prediction are promising. However, the comparison with some recent methods is missing, and the scalability of the approach needs more discussion.

Reviewer 2 (Score: 8/10): This is a solid contribution to bioinformatics and graph neural networks. The GraphFusion algorithm is well-designed and the experiments are comprehensive. The paper would benefit from a more in-depth analysis of the model's interpretability.

Here is the summary of the reviews: Both reviewers recognize the value of the GraphFusion algorithm for multi-modal data integration in bioinformatics, with suggestions for additional comparisons and analyses.

Please begin writing the strength of the submission based on the review."

- "Strength of the submission: The submission presents a novel and effective approach to multi-modal data integration in bioinformatics, with clear empirical advantages, comprehensive evaluation, and potential for significant impact in both theoretical and applied research in the field."
fewshot_examples: []

sys_prompt: >
You are an autonomous intelligent agent tasked to write the strength of the submission for the following submission you have made to an academic conference. Your summary of strength should summarize the reviews to help the reviewers to make a decision.
You will be provided with the following information:
Submission - The abstract of the paper submitted to this conference.
Reviews - It typically contains the score, a short summary, strength, and weakness of the submission.
Summary of Reviews - A short summary of the review.
Submission - Full content of the paper submitted to this conference.
Reviews - It typically contains the score, strength, and weakness of the submission, each by a different reviewer.

You should provide the following information:
Strength - The strength of the submission based on the review.
template: |
Here is the proposal: {proposal}
Strength - The strength of the submission based on the reviews.

template: |
Here are the reviews: {reviews}

Here is the summary of the reviews: {summary}
Please summarize the important points from the 'strength' section of the reviews.

Please begin writing the strength of the submission based on the review.
Please write in bullet points. It should be 200 words long.
37 changes: 0 additions & 37 deletions configs/agent_prompt/write_metareview_summary.yaml

This file was deleted.

Loading
Loading