New Attention variant - SSM (Adopted from Mamba and Hymba) #401

Mars-Cat2023 · 2025-02-20T23:31:35Z

New Attention Variant - SSM

We support one more command option --attention_variant="ssm".

TODO

Adding a hyperparameter list for selection of different attention variants (i.e. "causal", "linear" or "ssm") for different layers.

Training Results

You can replicate the following training results by running commands:

python3 train.py --compile
python3 train.py --compile --attention_variant="ssm"

Comparison between `--attention_variant="causal"` and `--attention_variant="ssm"`

--attention_variant="causal" step 0: train loss 4.2484, train_stdev 0.0031, val loss 4.2482, val_stdev 0.0031, lr 0.0010
--attention_variant="ssm" step 0: train loss 4.3725, train_stdev 0.0045, val loss 4.3620, val_stdev 0.0048, lr 0.0010

msaligane · 2025-02-20T23:33:35Z

Our Transformer Attention is called by python3 train.py.

Now you can try this new command to use all "SSM" blocks.
python3 train.py --attention_variant="ssm"

TODO: How to specify the "SSM" for Layer 1 and "Attention" for Layer

Great work Qilong!

gkielian · 2025-02-21T18:04:27Z

Awesome! I looked through the dependency checks, and seems we'll need to add a few more pip installs for this to pass our CI.

Could you add these to the requirements_cpu.txt and README?

Mars-Cat2023 · 2025-02-21T21:17:27Z

Updated and just waiting for solving CI and LICENSE problems.

Mars-Cat2023 added 2 commits February 20, 2025 18:00

Add SSM attention varient from Mamba & Hymba

ad805e2

Finish add Attention varient - SSM (from Mamba & Hymba)

99d1e11

SSM - Dependency Issue Fixed

d21783b

Mars-Cat2023 force-pushed the Mamba branch from 584d1aa to d21783b Compare February 21, 2025 19:17

Mars-Cat2023 added 2 commits February 21, 2025 14:22

SSM - Dependency Issue Fixed

365a4b1

SSM - Dependency Issue Fixed

5c3b50e

Mars-Cat2023 force-pushed the Mamba branch from 3966ad2 to 5c3b50e Compare February 21, 2025 20:07

Mars-Cat2023 added 2 commits February 21, 2025 15:27

SSM - Fixed Bugs

6eb0d95

Merge branch 'Mamba' of github.com:Mars-Cat2023/WQL_nanoGPT into Mamba

f174102

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Attention variant - SSM (Adopted from Mamba and Hymba) #401

New Attention variant - SSM (Adopted from Mamba and Hymba) #401

Mars-Cat2023 commented Feb 20, 2025 •

edited

Loading

msaligane commented Feb 20, 2025

gkielian commented Feb 21, 2025

Mars-Cat2023 commented Feb 21, 2025 •

edited

Loading

New Attention variant - SSM (Adopted from Mamba and Hymba) #401

Are you sure you want to change the base?

New Attention variant - SSM (Adopted from Mamba and Hymba) #401

Conversation

Mars-Cat2023 commented Feb 20, 2025 • edited Loading