Skip to content

Commit

Permalink
Blog/cohort of models (#17)
Browse files Browse the repository at this point in the history
* feat: rebase task

* feat: draft outline

* feat: blog archetype - explicit > implicit

* wip: add data tables

* wip: analysis

* wip: analysis

* feat: publish llama3 cohort
  • Loading branch information
ahgraber authored May 11, 2024
1 parent 8609f92 commit 982a554
Show file tree
Hide file tree
Showing 12 changed files with 436 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .taskfiles/hugo/taskfile.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ tasks:
sed -i '/draft: true/c\draft: false' {{ .path }}
- |
sed -i '/date: .*/c\date: {{ now.Format "2006-01-02" }}' {{ .path }}
- echo "Branch is ready for PR"
- echo "Commit this change, then branch is ready for PR!"
# - task: _pull_request
requires:
vars: ["path"]
Expand Down
7 changes: 7 additions & 0 deletions archetypes/blog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
title: {{ replace .Name "-" " " | title }}
date: {{ .Date }}
authors:
- name: ahgraber
link: https://github.com/ahgraber
image: https://github.com/ahgraber.png
tags:
# meta
- 'meta'
Expand All @@ -19,5 +23,8 @@ tags:
- 'copyright'
- 'privacy'
series: []
layout: single
toc: true
math: false
draft: true
---
2 changes: 1 addition & 1 deletion config/_default/params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ navbar:
width: 50
height: 50
footer:
width: full # *width
width: *width
displayCopyright: true
displayPoweredBy: true

Expand Down
7 changes: 7 additions & 0 deletions content/blog/hello-world/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
---
title: Hello World!
date: 2024-04-22
authors:
- name: ahgraber
link: https://github.com/ahgraber
image: https://github.com/ahgraber.png
tags:
- 'meta'
- 'blogumentation'
- 'homelab'
layout: single
toc: true
math: false
draft: false
---

Expand Down
21 changes: 21 additions & 0 deletions content/blog/llama3-cohort.md/architectures.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
,Meta,,,Google,Cohere,Databricks,Mistral,Meta,,Microsoft,,,Snowflake,DeepSeek
Release Date,18-Jul-23,,,21-Feb-24,11-Mar-24,27-Mar-24,17-Apr-24,18-Apr-24,,22-Apr-24,,,24-Apr-24,7-May-24
Name,llama-2-7B,llama-2-13B,llama-2-70B,Gemma 7B,Command-R,DBRX,8x22B,llama-3-8B,llama-3-70B,Phi 3 mini,Phi 3 small,Phi 3 medium,Arctic,v2
Training Tokens,2T,2T,2T,6T,_?_,12T,_?_,15T,80T,3.3T,4.8T,4.8T,3.5T,8.1T
Tokenizer Vocabulary,32k,32k,32k,256k,256k,100k,32k,128k,128k,32k,100k,32k (?),32k,100k
Context Length (training),4k,4k,4k,8k,8k,32k,4k,8k,8k,4k,4k,,4k,4k
Hidden dimension,4096,5120,8192,3072,8192,6144,6144,4096,8192,3072,4096,5120,7168,5120
FF dimension,11008,13824,28672,24576,,10752,16384,14336,28672,8192,_?_,_?_,4864,1536
Positional Encoding,RoPE,RoPE,RoPE,RoPE,RoPE?,RoPE,RoPE,RoPE,RoPE,RoPE / LongRoPE,RoPE?,RoPE?,RoPE,RoPE
Normalization,RMSNorm,RMSNorm,RMSNorm,RMSNorm,_?_,Layer,RMSNorm,RMSNorm,RMSNorm,RMSNorm,_?_,_?_,RMSNorm,RMSNorm
Activation Function,SwiGLU,SwiGLU,SwiGLU,GeGLU,SiLU,GLU,SiLU,SwiGLU,SwiGLU,SiLU,_?_,_?_,SwiGLU,SwiGLU
Attention,_?_,_?_,GQA,MQA,_?_,GQA,"SWA, GQA",GQA,GQA,SWA,GQA; BlockSparse,_?_,Attention-sinks SWA (TBD),MLA
Heads,32,40,64,16,64,48,48,32,64,32,32,40,56,128
Layers,32,40,80,28,40,40,56,32,80,32,32,40,35,60
Alignment,"SFT, PPO","SFT, PPO","SFT, Rejection Sampling, PPO","SFT, RLHF",_?_,"SFT, _RLHF (implied)_","? SFT, DPO","SFT, Rejection Sampling, PPO, DPO","SFT, Rejection Sampling, PPO, DPO","SFT, DPO",_?_,_?_,SFT,"SFT, GRPO"
MoE,no,no,no,no,no,yes,yes,no,no,no,no,no,hybrid,yes
Experts,,,,,,16,8,,,,,,128,160+2
Top-k,,,,,,4,2,,,,,,2,6
Total Params,,,,,,132B,141B,,,,,,480B,236B
**Parameters (active)**,**7B**,**13B**,**70B**,**7B**,**35B**,**36B**,**39B**,**8B**,**70B**,**3.8B**,**7B**,**14B**,**17B**,**21B**
Context Length (inference),4k,4k,4k,8k,128k,32k,64k,8k,8k,4k; 128k,8k,_?_,4k; 32k with SWA,128k
21 changes: 21 additions & 0 deletions content/blog/llama3-cohort.md/benchmarks.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
,Release Date,Name,MMLU (language),modifier,GSM8K (math),modifier,HumanEval (code),modifier
Meta,18-Jul-23,llama-2-7B,34.1,5-shot,25.7,8-shot CoT,7.9,0-shot
,,llama-2-13B,47.8,5-shot,77.4,8-shot CoT,14,0-shot
,,llama-2-70B,52.9,5-shot,57.5,8-shot CoT,25.6,0-shot
Google,21-Feb-24,Gemma 7B,64.3,5-shot,46.4,maj@1,32.3,0-shot
Cohere,11-Mar-24,Command-R,59.3,5-shot,,,,
Databricks,27-Mar-24,DBRX,73.7,5-shot,72.8,8-shot CoT,70.1,0-shot
Mistral,17-Apr-24,8x22B,77.7,5-shot,90.8,8-shot CoT,45.1,0-shot
Meta,18-Apr-24,llama-3-8B,68.4,5-shot,79.6,8-shot CoT,62.2,0-shot
,,llama-3-70B,82,5-shot,93,8-shot CoT,71.7,0-shot
Microsoft,22-Apr-24,Phi 3 mini,68.8,5-shot,82.5,0-shot CoT,59.1,0-shot
,,Phi 3 small,75.3,5-shot,88.9,0-shot CoT,59.1,0-shot
,,Phi 3 medium,78.2,5-shot,90.3,0-shot CoT,55.5,0-shot
Snowflake,24-Apr-24,Arctic,67.3,5-shot,74.2,?,64.3,?
DeepSeek,7-May-24,v2,78.5,5-shot,79.2,0-shot CoT,48.8,0-shot
,,,,,,,,
Anthropic,4-Mar-24,Claude 3 Haiku,75.2,5-shot,88.9,0-shot,75.9,0-shot
,,Claude 3 Sonnet,79,5-shot,92.3,0-shot,73,0-shot
,,Claude 3 Opus,86.8,5-shot,95,0-shot,84.9,0-shot
OpenAI,14-Mar-23,GPT 3.5-turbo,70,5-shot,57.1,5-shot CoT,48.1,0-shot
,14-Mar-23,GPT 4,86.4,5-shot,92,5-shot CoT,67,0-shot
15 changes: 15 additions & 0 deletions content/blog/llama3-cohort.md/environmental_impact.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
,Release Date,Name,GPUs,GPU Hours,Power Consumption (W),tCO2eq,FLOPs*,assumed utilization*
Meta,18-Jul-23,llama-2-7B,A100-80GB,"184,320",400,31.22,1.60E+22,30%
,,llama-2-13B,A100-80GB,"368,640",400,62.44,3.10E+22,30%
,,llama-2-70B,A100-80GB,"1,720,320",400,291.42,1.40E+23,30%
Google,21-Feb-24,Gemma 7B,4096 TPUv5e,,,"~131, incl. 2B models",,
Cohere,11-Mar-24,Command-R,,,,,,
Databricks,27-Mar-24,DBRX,3072 H100s,,,,,
Mistral,17-Apr-24,8x22B,,,,,,
Meta,18-Apr-24,llama-3-8B,16k H100s,1.3M,700,390,2.40E+23,40%
,,llama-3-70B,16k H100s,6.4M,700,1900,1.20E+24,40%
Microsoft,22-Apr-24,Phi 3 mini,,,,,,
,,Phi 3 small,,,,,,
,,Phi 3 medium,,,,,,
Snowflake,24-Apr-24,Arctic,H100s,"~504,000",700,,7.10E+22,30%
DeepSeek,7-May-24,v2,H800s,"~172,800",700,,,
Loading

0 comments on commit 982a554

Please sign in to comment.