diff --git a/AutoCap/README.md b/AutoCap/README.md
index 6cdddc4..a08d729 100644
--- a/AutoCap/README.md
+++ b/AutoCap/README.md
@@ -1,106 +1,119 @@
-[![arXiv](ARXIV ICON)](ARXIV LINK)
+
-# AutoCap inference, training and evaluation
+# GenAU inference, training, and evaluation
+- [Introduction](#introduction)
+- [Environemnt setup](#environment-initalization)
- [Inference](#inference)
- * [Audio to text script](#audio-to-text)
- * [Gradio demo](#gradio-demo)
- * [Caption a list of audio files](#caption-list-of-audio-files)
- * [Caption your custom dataset](#caption-a-dataset)
+ * [Audio to text script](#text-to-audio)
+ * [Inference a list of promots](#inference-a-list-of-prompts)
- [Training](#training)
+ * [GenAU](#genau)
+ * [Finetuning GenAU](#finetuning-genau)
+ * [1D-VAE (optional)](#1d-vae-optional)
- [Evaluation](#evaluation)
- [Cite this work](#cite-this-work)
- [Acknowledgements](#acknowledgements)
-# Environment initalization
+# Introduction
+We introduce GenAU, a transformer-based audio latent diffusion model leveraging the FIT architecture. Our model compresses mel-spectrogram data into a 1D representation and utilizes layered attention processes to achieve state-of-the-art audio generation results among open-source models.
+
+
+