-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(bench): evaluate review writing in ReviewBench (#917)
* init * baselines * upload * debug * ruff isort mypy * checkpoint * prompt update * update prompt * update prompt * checkpoint * checkpoint * code cleanup * upload evaluation code * remove comments * pytest * pytest * delete useless files * remove commented and outdated file/content * make topk as a param * debug * update readme --------- Co-authored-by: Haofei Yu <[email protected]>
- Loading branch information
1 parent
073f56d
commit 4b267ed
Showing
33 changed files
with
1,477 additions
and
637 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,17 @@ | ||
fewshot_examples: | ||
- "Here is the proposal: We present a novel deep learning architecture, TransformerX, for natural language processing tasks. Our model achieves state-of-the-art performance on multiple benchmarks while requiring significantly less computational resources than existing models. | ||
Here are the reviews: | ||
Reviewer 1 (Score: 8/10): The paper presents an innovative approach to efficient NLP modeling. The results are impressive, showing both performance gains and reduced computational requirements. However, the theoretical analysis could be more rigorous. | ||
Reviewer 2 (Score: 9/10): This is a strong paper with clear contributions. The TransformerX architecture is well-designed and the extensive experiments demonstrate its effectiveness. The paper could benefit from more ablation studies. | ||
Here is the summary of the reviews: Both reviewers acknowledge the novelty and effectiveness of the proposed TransformerX architecture, with minor suggestions for improvement. | ||
Please begin writing the strength of the submission based on the review." | ||
|
||
- "Strength of the submission: The submission presents a strong, innovative approach to NLP modeling with clear empirical advantages and thorough evaluation, making it a valuable contribution to the field." | ||
|
||
- "Here is the proposal: Our paper introduces a novel graph neural network algorithm, GraphFusion, for multi-modal data integration in bioinformatics. We demonstrate its effectiveness in predicting protein-protein interactions and drug-target affinities, outperforming existing methods on several benchmark datasets. | ||
Here are the reviews: | ||
Reviewer 1 (Score: 7/10): The paper presents an interesting approach to multi-modal data integration. The results on protein-protein interaction prediction are promising. However, the comparison with some recent methods is missing, and the scalability of the approach needs more discussion. | ||
Reviewer 2 (Score: 8/10): This is a solid contribution to bioinformatics and graph neural networks. The GraphFusion algorithm is well-designed and the experiments are comprehensive. The paper would benefit from a more in-depth analysis of the model's interpretability. | ||
Here is the summary of the reviews: Both reviewers recognize the value of the GraphFusion algorithm for multi-modal data integration in bioinformatics, with suggestions for additional comparisons and analyses. | ||
Please begin writing the strength of the submission based on the review." | ||
|
||
- "Strength of the submission: The submission presents a novel and effective approach to multi-modal data integration in bioinformatics, with clear empirical advantages, comprehensive evaluation, and potential for significant impact in both theoretical and applied research in the field." | ||
fewshot_examples: [] | ||
|
||
sys_prompt: > | ||
You are an autonomous intelligent agent tasked to write the strength of the submission for the following submission you have made to an academic conference. Your summary of strength should summarize the reviews to help the reviewers to make a decision. | ||
You will be provided with the following information: | ||
Submission - The abstract of the paper submitted to this conference. | ||
Reviews - It typically contains the score, a short summary, strength, and weakness of the submission. | ||
Summary of Reviews - A short summary of the review. | ||
Submission - Full content of the paper submitted to this conference. | ||
Reviews - It typically contains the score, strength, and weakness of the submission, each by a different reviewer. | ||
You should provide the following information: | ||
Strength - The strength of the submission based on the review. | ||
template: | | ||
Here is the proposal: {proposal} | ||
Strength - The strength of the submission based on the reviews. | ||
template: | | ||
Here are the reviews: {reviews} | ||
Here is the summary of the reviews: {summary} | ||
Please summarize the important points from the 'strength' section of the reviews. | ||
Please begin writing the strength of the submission based on the review. | ||
Please write in bullet points. It should be 200 words long. |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.