This project aims to help programmers write documentation for their code, using a fine-tuned T5 model to generate readable documentation from code chunks. The training is done on the command line and the documentation is generated on a web interface through Streamlit.
- Documentation generation for code chunks.
- Interactive Web Interface made using Streamlit
- Model fine-tuned on code-documentation pairs.
git clone https://github.com/Nebu0528/CodeDocAI
Run the command below to install all the dependencies needed:
pip install -r dependencies.txt
This step is necessary to teach the model how to interpret code and generate corresponding documentation based on that training. The model is being trained on this:
https://huggingface.co/datasets/jtatman/python-code-dataset-500k
-
- Train the model using:
python src/model.py
-
- After training, the model will be saved in the models/codetext_t5/ directory.
Once the model is trained, run the app:
streamlit run interface/website.py
Once the app is running, open the URL provided by Streamlit
http://localhost:8501
- Paste a code snippet into the text area in the Streamlit interface.
- Click "Generate Documentation" to generate the documentation for your code.
def multiply(a, b):
#This function returns the product of variables a and b
return a * b
- Only works if you provide one function at a time with detailed comments
- Integrate the Tranformers and BART Models to summarize the documentation
- Handle different training dataset types (i.e .json, csv etc..)
- Being able to download the generated documentation
- Train the model on this dataset