Skip to content
@apartresearch

apartresearch

Artificial intelligence will change the world. Our mission is to ensure this happens safely and to the benefit of everyone.

Apart facilitates new research in AI safety, towards reducing societal-scale risks from the technology.

We combine a community focus with a drive for high-quality security research.


Read more about our work:

  • Our Research — Foundational research for safe and beneficial advanced AI
  • Apart Lab — Our research fellowship program for aspiring researchers in AI safety
  • Apart Sprints — Weekend-long research sprints and hackathons for AI security and governance

Twitter Badge LinkedIn Badge YouTube Badge Discord Badge Alignment Jam RSS Badge

Pinned Loading

  1. interpretability-starter interpretability-starter Public

    🧠 Starter templates for doing interpretability research

    63 1

  2. Neuron2Graph Neuron2Graph Public

    Tools for exploring Transformer neuron behaviour, including input pruning and diversification.

    Jupyter Notebook 19 5

  3. deepdecipher deepdecipher Public

    🦠 DeepDecipher: An open source API to MLP neurons

    Rust 9

  4. specificityplus specificityplus Public

    👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

    Python 20 3

  5. Integer_Addition Integer_Addition Public

    ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks

    Jupyter Notebook 14 1

  6. readingwhatwecan readingwhatwecan Public

    📚📚📚📚📚📚📚📚📚 Reading everything

    CSS 13 3

Repositories

Showing 10 of 37 repositories
  • team-sync-lab Public
    apartresearch/team-sync-lab’s past year of commit activity
    TypeScript 0 0 0 0 Updated Nov 24, 2024
  • Interpreting-Learned-Feedback-Patterns Public

    ✱ Interpreting learned feedback patterns in large language models

    apartresearch/Interpreting-Learned-Feedback-Patterns’s past year of commit activity
    Jupyter Notebook 2 MIT 1 7 0 Updated Nov 21, 2024
  • seqcont_circuits Public

    ✱ Interpreting how similar sequence continuation tasks share internal representations ✱

    apartresearch/seqcont_circuits’s past year of commit activity
    Jupyter Notebook 1 MIT 0 1 0 Updated Nov 9, 2024
  • 3cb Public

    3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models

    apartresearch/3cb’s past year of commit activity
    Python 4 0 0 0 Updated Oct 30, 2024
  • hackathon-utils Public

    😎 Code to run hackathons efficiently

    apartresearch/hackathon-utils’s past year of commit activity
    HTML 1 MIT 0 0 0 Updated Oct 24, 2024
  • ICML2024MI Public

    🌍 Website for NeurIPS2023MI

    apartresearch/ICML2024MI’s past year of commit activity
    CSS 1 2 0 0 Updated Aug 19, 2024
  • Integer_Addition Public

    ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks

    apartresearch/Integer_Addition’s past year of commit activity
    Jupyter Notebook 14 MIT 1 0 0 Updated Aug 16, 2024
  • apartresearch/Research-Augmentation-Hackbook’s past year of commit activity
    Python 5 0 0 0 Updated Jul 19, 2024
  • evaluations-starter Public

    How to get started in evaluations and demonstrations research for dangerous capabilities

    apartresearch/evaluations-starter’s past year of commit activity
    5 MIT 1 1 0 Updated May 24, 2024
  • deepdecipher Public

    🦠 DeepDecipher: An open source API to MLP neurons

    apartresearch/deepdecipher’s past year of commit activity
    Rust 9 MIT 0 46 0 Updated May 2, 2024

Top languages

Loading…

Most used topics

Loading…