Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Add weight decay per layer
#311 opened Mar 26, 2025 by eliebak
6 tasks
Embed norm log
#305 opened Mar 25, 2025 by eliebak Draft
6 tasks
Ademamix
#300 opened Mar 23, 2025 by eliebak Draft
6 tasks
flex-attention
#299 opened Mar 23, 2025 by NouamaneTazi Draft
6 tasks
Muon
#298 opened Mar 23, 2025 by eliebak Draft
6 tasks
[WIP] Distillation
#290 opened Mar 6, 2025 by Stillerman
2 of 14 tasks
Add MLA
#278 opened Feb 5, 2025 by zzhhjjj
Add nanotron performance
#274 opened Jan 23, 2025 by xrsrke
fp8
#266 opened Dec 18, 2024 by xrsrke
Fix wrong initialization of lr scheduler
#256 opened Nov 29, 2024 by kylematoba Loading…
[NEW] Llama3.2 weight converters 🦙
#255 opened Nov 28, 2024 by TJ-Solergibert Loading…
6 tasks
Fix initial_lr when resuming training
#243 opened Nov 17, 2024 by Lauler Loading…
Load random states from checkpoint
#238 opened Nov 2, 2024 by gritukan Loading…
lighteval support after checkpoint, UX refactor
#222 opened Aug 24, 2024 by eliebak Loading…
Refactor pre tokenization tool
#219 opened Aug 21, 2024 by eliebak Loading…
Created interconnect benchmark before the training
#200 opened Jun 22, 2024 by RamenBuddha Loading…
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.