This directory contains all model predictions and evaluations. Results for all experiments and runs are also available in a Google Sheet here.
At the top level, this directory is divided into tasks TaskA
and TaskB
.
TaskA-ValidationSet-SubmissionFormat.csv
contains the shared task validation set in the submission format for easier evaluation withmodel outputs
. This file can be re-generated withscripts/convert_to_submission_format.py
.predictions
contains the model outputs for three runsresults
contains the metrics after evaluating the outputs of each run
TaskB-ValidationSet-SubmissionFormat.csv
contains the shared task validation set in the submission format for easier evaluation with model outputs. This file can be re-generated withscripts/convert_to_submission_format.py
.predictions
contains the model outputs for three runsresults
contains the metrics after evaluating the outputs of each runpredictions
andresults
are further divided by approach, intofine-tuning
andin-context-learning
in-context-learning
is further divided according to the ablation intofiltered
andunfiltered
and thenrandom
andsimilar
, and finallynote_only
anddialogue_note
human_eval
contains all the resources used in the human evaluation (seehuman_eval/README.md
for more details)
- Contains raw token length counts and histograms for the training and validation sets of all tasks. Further divided by tokenizer used (
"gpt-4"
inopenai
or"google/flan-t5-large"
inhuggingface
). Can be re-generated withscripts/count_and_plot_tokens.py
.