This research project explores the interpretability of large language models (Llama-2-7B) through the implementation of two probing techniques -- Logit-Lens and Tuned-Lens. By dissecting the internal workings of these models, the aim is to shed light on how LLMs understand and generate language, offering insights into their decision-making processes.
The project delves into the Llama-2-7B model to understand the mechanics behind its language understanding capabilities. Using probing techniques, analyzing the models at different layers and stages of extracting interpretable features.
- Hardware: M1 Macbook Pro (16GB RAM)
- OS: MacOS 14.2.1 (Linux and Windows should work as well, but not tested yet)
- Python Version: 3.10
- Framework: Pytorch
Follow these steps to set up a local development environment:
-
Clone the repository:
git clone https://github.com/DanYuDE/Research-Project.git
-
Navigate to the project directory:
cd Research-Project
-
Create a virtual environment:
python -m venv venv source venv/bin/activate
or Using Anaconda:
conda create -n your_env_name python=3.10 conda activate your_env_name
-
Install the required packages:
pip install -r requirements.txt
-
Add new
config.py
file, theconfig_example.py
file is provided as a template:nano config.py
Update the
token
(Huggingface token) andsampleText
(can add more LLM models inmodel
).Note: The sampleText has specific format with clear instruction.
-
Run the project:
make
This project incorporates code and techniques inspired by the work of nrimsky as detailed in the Intermediate Decoding Notebook. I extend my gratitude for the foundational methodologies and code examples provided, which have been pivotal in the development of probing techniques for LLM interpretability.
- I express our sincere appreciation to nrimsky for her groundbreaking work on language model interpretability, which has significantly influenced this project.
- Acknowledgment goes to the transformers library by Hugging Face, which has been instrumental in facilitating the research.