From 4216956a10d669e33f16e6e664d8ff990fa79106 Mon Sep 17 00:00:00 2001 From: Christian Date: Wed, 13 Dec 2023 01:10:08 -0800 Subject: [PATCH] updated readme --- README.md | 4 ++-- visualization_notebooks/README.md | 4 +++- visualization_notebooks/create_training_visualizations.ipynb | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 795a54a..5a773b6 100644 --- a/README.md +++ b/README.md @@ -155,7 +155,7 @@ The arguments the run this are in the bottom of the script where the argparse ar To generate the radar plots, we copy code from [this colab notebook by Lmsys](https://colab.research.google.com/drive/15O3Y8Rxq37PuMlArE291P4OC6ia37PQK#scrollTo=5i8R0l-XqkgO). -We provide a customized copy in this repo, run [generate_mt_bench_plots.ipynb](mt_bench/generate_mt_bench_plots.ipynb) to replicate the plots. +We provide a customized copy in this repo, run [generate_mt_bench_plots.ipynb](visualization_notebooks/generate_mt_bench_plots.ipynb) to replicate the plots. ## MT-Bench Results @@ -193,4 +193,4 @@ To evaluate our model on MT-Bench do the following setup in you favorite python ## Training Curves, Hyper-Parameters, and Ablations -To replicate the training curves in the paper, run through [this ipynb](training_curves/create_training_visualizations.ipynb). The batching graphs were pulled from WandB, so the files are directly included in the same directory as the notebook. +To replicate the training curves in the paper, run through [this ipynb](visualization_notebooks/create_training_visualizations.ipynb). The batching graphs were pulled from WandB, so the files are directly included in the same directory as the notebook. diff --git a/visualization_notebooks/README.md b/visualization_notebooks/README.md index 04e7e3c..6abf386 100644 --- a/visualization_notebooks/README.md +++ b/visualization_notebooks/README.md @@ -2,4 +2,6 @@ - [profanity.ipynb](profanity.ipynb): An analysis of profanity usage statistics between different models. We find that further finetuning increases profanity usage, likely due to model forgetting of value alignment. - [human_feedback.ipynb](human_feedback.ipynb): An analysis of our human feedback surveys. We find that humans vastly prefer our model outputs for rap, and are even for pop. -- [swift_LM.ipynb](swift_LM.ipynb): A data analysis & visualization of ground-truth n-gram frequency between baseline and Taylor Swift finetuned models. We find that lyre-swift (the model finetuned from lyre) tends to perform less plagarism. \ No newline at end of file +- [swift_LM.ipynb](swift_LM.ipynb): A data analysis & visualization of ground-truth n-gram frequency between baseline and Taylor Swift finetuned models. We find that lyre-swift (the model finetuned from lyre) tends to perform less plagarism. +- [create_training_visualizations](create_training_visualizations.ipynb): Analysis notebooks for isualizing data from finetuning ablations and monitoring. +- [generate_mt_bench_plots.ipynb](generate_mt_bench_plots.ipynb): Analysis of task-specific catastrophic forgetting; figure generation from mt-bench. \ No newline at end of file diff --git a/visualization_notebooks/create_training_visualizations.ipynb b/visualization_notebooks/create_training_visualizations.ipynb index 00d314e..f590451 100644 --- a/visualization_notebooks/create_training_visualizations.ipynb +++ b/visualization_notebooks/create_training_visualizations.ipynb @@ -12,7 +12,7 @@ "import matplotlib.pyplot as plt\n", "\n", "import sys\n", - "sys.path.insert(0, \"../training_curves\")" + "sys.path.insert(0, \"../data/training_curves\")" ] }, {