Interpretability Study of Large Language Models with Probing Techniques

This research project explores the interpretability of large language models (Llama-2-7B) through the implementation of two probing techniques -- Logit-Lens and Tuned-Lens. By dissecting the internal workings of these models, the aim is to shed light on how LLMs understand and generate language, offering insights into their decision-making processes.

Description

The project delves into the Llama-2-7B model to understand the mechanics behind its language understanding capabilities. Using probing techniques, analyzing the models at different layers and stages of extracting interpretable features.

Getting Started

Dependencies

Hardware: M1 Macbook Pro (16GB RAM)
OS: MacOS 14.2.1 (Linux and Windows should work as well, but not tested yet)
Python Version: 3.10
Framework: Pytorch

Installing

Follow these steps to set up a local development environment:

Clone the repository:

git clone https://github.com/DanYuDE/Research-Project.git

Navigate to the project directory:
```
cd Research-Project
```

Create a virtual environment:

python -m venv venv
source venv/bin/activate

or Using Anaconda:

 conda create -n your_env_name python=3.10
 conda activate your_env_name

Install the required packages:
```
pip install -r requirements.txt
```
Add new config.py file, the config_example.py file is provided as a template:
```
nano config.py
```
Update the token (Huggingface token) and sampleText (can add more LLM models in model).

Note: The sampleText has specific format with clear instruction.
Run the project:
```
make
```

Credits and Acknowledgments

Code Reference

This project incorporates code and techniques inspired by the work of nrimsky as detailed in the Intermediate Decoding Notebook. I extend my gratitude for the foundational methodologies and code examples provided, which have been pivotal in the development of probing techniques for LLM interpretability.

Special Thanks

I express our sincere appreciation to nrimsky for her groundbreaking work on language model interpretability, which has significantly influenced this project.
Acknowledgment goes to the transformers library by Hugging Face, which has been instrumental in facilitating the research.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
files		files
output		output
result		result
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
config_example.py		config_example.py
makefile		makefile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interpretability Study of Large Language Models with Probing Techniques

Description

Getting Started

Dependencies

Installing

Credits and Acknowledgments

Code Reference

Special Thanks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

danyuan-de/Probing-LLM

Folders and files

Latest commit

History

Repository files navigation

Interpretability Study of Large Language Models with Probing Techniques

Description

Getting Started

Dependencies

Installing

Credits and Acknowledgments

Code Reference

Special Thanks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages