-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: rebase task * feat: draft outline * feat: blog archetype - explicit > implicit * wip: add data tables * wip: analysis * wip: analysis * feat: publish llama3 cohort
- Loading branch information
Showing
12 changed files
with
436 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
,Meta,,,Google,Cohere,Databricks,Mistral,Meta,,Microsoft,,,Snowflake,DeepSeek | ||
Release Date,18-Jul-23,,,21-Feb-24,11-Mar-24,27-Mar-24,17-Apr-24,18-Apr-24,,22-Apr-24,,,24-Apr-24,7-May-24 | ||
Name,llama-2-7B,llama-2-13B,llama-2-70B,Gemma 7B,Command-R,DBRX,8x22B,llama-3-8B,llama-3-70B,Phi 3 mini,Phi 3 small,Phi 3 medium,Arctic,v2 | ||
Training Tokens,2T,2T,2T,6T,_?_,12T,_?_,15T,80T,3.3T,4.8T,4.8T,3.5T,8.1T | ||
Tokenizer Vocabulary,32k,32k,32k,256k,256k,100k,32k,128k,128k,32k,100k,32k (?),32k,100k | ||
Context Length (training),4k,4k,4k,8k,8k,32k,4k,8k,8k,4k,4k,,4k,4k | ||
Hidden dimension,4096,5120,8192,3072,8192,6144,6144,4096,8192,3072,4096,5120,7168,5120 | ||
FF dimension,11008,13824,28672,24576,,10752,16384,14336,28672,8192,_?_,_?_,4864,1536 | ||
Positional Encoding,RoPE,RoPE,RoPE,RoPE,RoPE?,RoPE,RoPE,RoPE,RoPE,RoPE / LongRoPE,RoPE?,RoPE?,RoPE,RoPE | ||
Normalization,RMSNorm,RMSNorm,RMSNorm,RMSNorm,_?_,Layer,RMSNorm,RMSNorm,RMSNorm,RMSNorm,_?_,_?_,RMSNorm,RMSNorm | ||
Activation Function,SwiGLU,SwiGLU,SwiGLU,GeGLU,SiLU,GLU,SiLU,SwiGLU,SwiGLU,SiLU,_?_,_?_,SwiGLU,SwiGLU | ||
Attention,_?_,_?_,GQA,MQA,_?_,GQA,"SWA, GQA",GQA,GQA,SWA,GQA; BlockSparse,_?_,Attention-sinks SWA (TBD),MLA | ||
Heads,32,40,64,16,64,48,48,32,64,32,32,40,56,128 | ||
Layers,32,40,80,28,40,40,56,32,80,32,32,40,35,60 | ||
Alignment,"SFT, PPO","SFT, PPO","SFT, Rejection Sampling, PPO","SFT, RLHF",_?_,"SFT, _RLHF (implied)_","? SFT, DPO","SFT, Rejection Sampling, PPO, DPO","SFT, Rejection Sampling, PPO, DPO","SFT, DPO",_?_,_?_,SFT,"SFT, GRPO" | ||
MoE,no,no,no,no,no,yes,yes,no,no,no,no,no,hybrid,yes | ||
Experts,,,,,,16,8,,,,,,128,160+2 | ||
Top-k,,,,,,4,2,,,,,,2,6 | ||
Total Params,,,,,,132B,141B,,,,,,480B,236B | ||
**Parameters (active)**,**7B**,**13B**,**70B**,**7B**,**35B**,**36B**,**39B**,**8B**,**70B**,**3.8B**,**7B**,**14B**,**17B**,**21B** | ||
Context Length (inference),4k,4k,4k,8k,128k,32k,64k,8k,8k,4k; 128k,8k,_?_,4k; 32k with SWA,128k |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
,Release Date,Name,MMLU (language),modifier,GSM8K (math),modifier,HumanEval (code),modifier | ||
Meta,18-Jul-23,llama-2-7B,34.1,5-shot,25.7,8-shot CoT,7.9,0-shot | ||
,,llama-2-13B,47.8,5-shot,77.4,8-shot CoT,14,0-shot | ||
,,llama-2-70B,52.9,5-shot,57.5,8-shot CoT,25.6,0-shot | ||
Google,21-Feb-24,Gemma 7B,64.3,5-shot,46.4,maj@1,32.3,0-shot | ||
Cohere,11-Mar-24,Command-R,59.3,5-shot,,,, | ||
Databricks,27-Mar-24,DBRX,73.7,5-shot,72.8,8-shot CoT,70.1,0-shot | ||
Mistral,17-Apr-24,8x22B,77.7,5-shot,90.8,8-shot CoT,45.1,0-shot | ||
Meta,18-Apr-24,llama-3-8B,68.4,5-shot,79.6,8-shot CoT,62.2,0-shot | ||
,,llama-3-70B,82,5-shot,93,8-shot CoT,71.7,0-shot | ||
Microsoft,22-Apr-24,Phi 3 mini,68.8,5-shot,82.5,0-shot CoT,59.1,0-shot | ||
,,Phi 3 small,75.3,5-shot,88.9,0-shot CoT,59.1,0-shot | ||
,,Phi 3 medium,78.2,5-shot,90.3,0-shot CoT,55.5,0-shot | ||
Snowflake,24-Apr-24,Arctic,67.3,5-shot,74.2,?,64.3,? | ||
DeepSeek,7-May-24,v2,78.5,5-shot,79.2,0-shot CoT,48.8,0-shot | ||
,,,,,,,, | ||
Anthropic,4-Mar-24,Claude 3 Haiku,75.2,5-shot,88.9,0-shot,75.9,0-shot | ||
,,Claude 3 Sonnet,79,5-shot,92.3,0-shot,73,0-shot | ||
,,Claude 3 Opus,86.8,5-shot,95,0-shot,84.9,0-shot | ||
OpenAI,14-Mar-23,GPT 3.5-turbo,70,5-shot,57.1,5-shot CoT,48.1,0-shot | ||
,14-Mar-23,GPT 4,86.4,5-shot,92,5-shot CoT,67,0-shot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
,Release Date,Name,GPUs,GPU Hours,Power Consumption (W),tCO2eq,FLOPs*,assumed utilization* | ||
Meta,18-Jul-23,llama-2-7B,A100-80GB,"184,320",400,31.22,1.60E+22,30% | ||
,,llama-2-13B,A100-80GB,"368,640",400,62.44,3.10E+22,30% | ||
,,llama-2-70B,A100-80GB,"1,720,320",400,291.42,1.40E+23,30% | ||
Google,21-Feb-24,Gemma 7B,4096 TPUv5e,,,"~131, incl. 2B models",, | ||
Cohere,11-Mar-24,Command-R,,,,,, | ||
Databricks,27-Mar-24,DBRX,3072 H100s,,,,, | ||
Mistral,17-Apr-24,8x22B,,,,,, | ||
Meta,18-Apr-24,llama-3-8B,16k H100s,1.3M,700,390,2.40E+23,40% | ||
,,llama-3-70B,16k H100s,6.4M,700,1900,1.20E+24,40% | ||
Microsoft,22-Apr-24,Phi 3 mini,,,,,, | ||
,,Phi 3 small,,,,,, | ||
,,Phi 3 medium,,,,,, | ||
Snowflake,24-Apr-24,Arctic,H100s,"~504,000",700,,7.10E+22,30% | ||
DeepSeek,7-May-24,v2,H800s,"~172,800",700,,, |
Oops, something went wrong.