Skip to content

danyuan-de/Probing-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interpretability Study of Large Language Models with Probing Techniques

This research project explores the interpretability of large language models (Llama-2-7B) through the implementation of two probing techniques -- Logit-Lens and Tuned-Lens. By dissecting the internal workings of these models, the aim is to shed light on how LLMs understand and generate language, offering insights into their decision-making processes.

Description

The project delves into the Llama-2-7B model to understand the mechanics behind its language understanding capabilities. Using probing techniques, analyzing the models at different layers and stages of extracting interpretable features.

Getting Started

Dependencies

  • Hardware: M1 Macbook Pro (16GB RAM)
  • OS: MacOS 14.2.1 (Linux and Windows should work as well, but not tested yet)
  • Python Version: 3.10
  • Framework: Pytorch

Installing

Follow these steps to set up a local development environment:

  1. Clone the repository:

    git clone https://github.com/DanYuDE/Research-Project.git
  2. Navigate to the project directory:

    cd Research-Project
  3. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate

    or Using Anaconda:

     conda create -n your_env_name python=3.10
     conda activate your_env_name
  4. Install the required packages:

    pip install -r requirements.txt
  5. Add new config.py file, the config_example.py file is provided as a template:

    nano config.py

    Update the token (Huggingface token) and sampleText (can add more LLM models in model).

    Note: The sampleText has specific format with clear instruction.

  6. Run the project:

    make

Credits and Acknowledgments

Code Reference

This project incorporates code and techniques inspired by the work of nrimsky as detailed in the Intermediate Decoding Notebook. I extend my gratitude for the foundational methodologies and code examples provided, which have been pivotal in the development of probing techniques for LLM interpretability.

Special Thanks

  • I express our sincere appreciation to nrimsky for her groundbreaking work on language model interpretability, which has significantly influenced this project.
  • Acknowledgment goes to the transformers library by Hugging Face, which has been instrumental in facilitating the research.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published