Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Attention variant - SSM (Adopted from Mamba and Hymba) #401

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Mars-Cat2023
Copy link
Collaborator

@Mars-Cat2023 Mars-Cat2023 commented Feb 20, 2025

New Attention Variant - SSM

We support one more command option --attention_variant="ssm".

TODO

  • Adding a hyperparameter list for selection of different attention variants (i.e. "causal", "linear" or "ssm") for different layers.

Training Results

You can replicate the following training results by running commands:

python3 train.py --compile
python3 train.py --compile --attention_variant="ssm"

Comparison between --attention_variant="causal" and --attention_variant="ssm"

image
image
image

--attention_variant="causal" step 0: train loss 4.2484, train_stdev 0.0031, val loss 4.2482, val_stdev 0.0031, lr 0.0010
--attention_variant="ssm" step 0: train loss 4.3725, train_stdev 0.0045, val loss 4.3620, val_stdev 0.0048, lr 0.0010
image

@msaligane
Copy link
Member

Our Transformer Attention is called by python3 train.py.

Now you can try this new command to use all "SSM" blocks.
python3 train.py --attention_variant="ssm"

  • TODO: How to specify the "SSM" for Layer 1 and "Attention" for Layer

Great work Qilong!

@gkielian
Copy link
Collaborator

Awesome! I looked through the dependency checks, and seems we'll need to add a few more pip installs for this to pass our CI.

Could you add these to the requirements_cpu.txt and README?

@Mars-Cat2023
Copy link
Collaborator Author

Mars-Cat2023 commented Feb 21, 2025

Updated and just waiting for solving CI and LICENSE problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants