Skip to content

This repository contains implementations and illustrative code related to LLMs publications.

License

Notifications You must be signed in to change notification settings

deepbiolab/llm-paper-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Paper Research

This repository contains implementations and illustrative code related to LLMs publications. You can find notes for each paper in the notes folder and below hyperlinks named research. It's my prototype about a specific paper for the exploration of the paper's key idea.

Outline

  1. Attention Is All You Need: Query, Key, and Value are all you need (Also position embeddings, multiple heads, feed-forward layers, skip-connections, etc.)

  2. GPT: Improving Language Understanding by Generative Pre-Training: Decoder is all you need (Also, pre-training + finetuning)

  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: Encoder is all you need*. Left-to-right language modeling is NOT all you need. (*Also, pre-training + finetuning)

  4. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer: Encoder-only or decoder-only is NOT all you need, though text-to-text is all you need* (Also, pre-training + finetuning)

  5. GPT2: Language Models are Unsupervised Multitask Learners: Unsupervised pre-training is all you need?!

  6. GPT3: Language Models are Few-Shot Learners: Unsupervised pre-training + a few* examples is all you need. (From 5 examples, in Conversational QA, to 50 examples in Winogrande, PhysicalQA, and TriviaQA)

  7. Scaling Laws for Neural Language Models: Larger models trained on lesser data* are what you you need. (10x more compute should be spent on 5.5x larger model and 1.8x more tokens)

  8. Chinchilla: Training Compute-Optimal Large Language Models: Smaller models trained on more data are what you need. (10x more compute should be spent on 3.2x larger model and 3.2x more tokens)

  9. LLaMA: Open and Efficient Foundation Language Models: Smoler models trained longer—on public data—is all you need

  10. InstructGPT: Training language models to follow instructions with human feedback: 40 labelers are all you need* (Plus supervised fine-tuning, reward modeling, and PPO)

  11. LoRA: Low-Rank Adaptation of Large Language Models: One rank is all you need

  12. QLoRA: Efficient Finetuning of Quantized LLMs: 4-bit is all you need* (Plus double quantization and paged optimizers)

  13. DPR: Dense Passage Retrieval for Open-Domain Question Answering: Dense embeddings are all you need (Also, high precision retrieval)

  14. RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: Semi-parametric models* are all you need (*Dense vector retrieval as non-parametric component; pre-trained LLM as parametric component)

  15. RETRO: Improving language models by retrieving from trillions of tokens: Retrieving based on input chunks and chunked cross attention are all you need

  16. Internet-augmented language models through few-shot prompting for open-domain question answering: Google Search as retrieval is all you need

  17. HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels: LLM-generated, hypothetical documents are all you need

  18. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness: For-loops in SRAM are all you need

  19. ALiBi; Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation: Constant bias on the query-key dot-product is all you need* (Also hyperparameter m and cached Q, K, V representations)

  20. Codex: Evaluating Large Language Models Trained on Code: Finetuning on code is all you need

  21. Layer Normalization: Consistent mean and variance at each layer is all you need

  22. On Layer Normalization in the Transformer Architecture: Pre-layer norm, instead of post-layer norm, is all you need

  23. PPO: Proximal Policy Optimization Algorithms: Clipping your surrogate function is all you need

  24. WizardCoder: Empowering Code Large Language Models with Evol-Instruct: Asking the model to make the question harder is all you need* (Where do they get the responses to these harder questions though?!)

  25. Llama 2: Open Foundation and Fine-Tuned Chat Models: Iterative finetuning, PPO, rejection sampling, and ghost attention is all you need* (Also, 27,540 SFT annotations and more than 1 million binary comparison preference data)

  26. RWKV: Reinventing RNNs for the Transformer Era: Linear attention during inference, via RNNs, is what you need

  27. RLAIF - Constitutional AI: Harmlessness from AI Feedback: A natural language constitution* and model feedback on harmlessness is all you need (16 different variants of harmlessness principles)

  28. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer: Noise in your softmax and expert regularization are all you need

  29. CLIP: Learning Transferable Visual Models From Natural Language Supervision: *A projection layer between text and image embeddings is all you need (*Also, 400 million image-text pairs)

  30. ViT; An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: Flattened 2D patches are all you need

  31. Generative Agents: Interactive Simulacra of Human Behavior: Reflection, memory, and retrieval are all you need

  32. Out-of-Domain Finetuning to Bootstrap Hallucination Detection: Open-source, permissive-use data is what you need

  33. DPO; Direct Preference Optimization: Your Language Model is Secretly a Reward Model: A separate reward model is NOT what you need

  34. Consistency Models: Mapping to how diffusion adds gaussian noise to images is all you need

  35. LCM; Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference: Consistency modeling in latent space is all you need* (*Also, a diffusion model to distill from)

  36. LCM-LoRA: A Universal Stable-Diffusion Acceleration Module: Combining LoRAs is all you need

  37. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models: Asking the LLM to reflect on retrieved documents is all you need

  38. Emergent Abilities of Large Language Models: The Bitter Lesson is all you need

  39. Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions: The Bellman equation and replay buffers are all you need

  40. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations: Classification guidelines and the multiple-choice response are all you need

  41. RESTEM; Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models: Synthetic data and a reward function are all you need

  42. Mixture of Experts Explained: Conditional computation and sparsity are all you need

  43. SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models: Generator and discriminator are all you need.

  44. Self-Instruct: Aligning Language Models with Self-Generated Instructions: 54% valid instruction-input-output tuples is all you need.

  45. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling: Well documented, publicly available model checkpoints are all you need.

  46. Self-Rewarding Language Models: Asking the model to evaluate itself is all you need.

  47. Building Your Own Product Copilot - Challenges, Opportunities, and Needs: Prompt engineering LLMs is NOT all you need.

  48. Matryoshka Representation Learning: Aggregated losses across 2n-dim embeddings is all you need.

  49. Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems: Bigger GPUs is not all you need.

  50. How to Generate and Use Synthetic Data for Finetuning: Synthetic data is almost all you need.

  51. Whisper: Robust Speech Recognition via Large-Scale Weak Supervision: 680k hrs of audio and multitask formulated as a sequence is all you need.

Acknowledgments

Special thanks to Eugene Yan for this curated reading list from the Language Modeling Reading List article, which served as an invaluable foundation for organizing these papers. This well-structured collection of fundamental language modeling papers has been instrumental in guiding my learning journey.

Citations

@article{yan2024default,
  title   = {Language Modeling Reading List (to Start Your Paper Club)},
  author  = {Yan, Ziyou},
  journal = {eugeneyan.com},
  year    = {2024},
  month   = {Jan},
  url     = {https://eugeneyan.com/writing/llm-reading-list/}
}

About

This repository contains implementations and illustrative code related to LLMs publications.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published