Section 6 : Large Language Model: Challenges and Solutions

AGI Discussion

AGI: Artificial General Intelligence
Artificial General Intelligence Society: a central hub for AGI research, publications, and conference details. ref
Machine Intelligence Research Institute (MIRI): a leading organization in AGI safety and alignment, focusing on theoretical work to ensure safe AI development. ref
LessWrong & Alignment Forum: Extensive discussions on AGI alignment, with contributions from experts in AGI safety. ref:LessWrong | ref:Alignment Forum
Artificial General Intelligence: Concept, State of the Art, and Future Prospects [Jan 2014]
There is no Artificial General Intelligence: A critical perspective arguing that human-like conversational intelligence cannot be mathematically modeled or replicated by current AGI theories. [9 Jun 2019]
The General Theory of General Intelligence: A Pragmatic Patternist Perspective: a patternist philosophy of mind, arguing for a formal theory of general intelligence based on patterns and complexity. [28 Mar 2021]
Sparks of Artificial General Intelligence: Early experiments with GPT-4: [22 Mar 2023]
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era [4 Apr 2023]
Levels of AGI for Operationalizing Progress on the Path to AGI: Provides a comprehensive discussion on AGI's progress and proposes metrics and benchmarks for assessing AGI systems. [4 Nov 2023]
How Far Are We From AGI: A survey discussing AGI's goals, developmental trajectory, and alignment technologies, providing a roadmap for AGI realization. [16 May 2024]
OpenAI: Planning for AGI and beyond [24 Feb 2023]
OpenAI's CEO, Sam Altman, predicts AGI could emerge by 2025. ref [9 Nov 2024]
Anthropic's CEO, Dario Amodei, predicts AGI between 2026 and 2027. ref [13 Nov 2024]
Key figures and their predicted AGI timelines:💡AGI might be emerging between 2025 to 2030. [19 Nov 2024]

OpenAI's Roadmap and Products

OpenAI's roadmap

The Timeline of the OpenaAI's Founder Journeys [15 Oct 2024]
Humanloop Interview 2023 : doc [29 May 2023]
OpenAI’s CEO Says the Age of Giant AI Models Is Already Over ref [17 Apr 2023]
Q* (pronounced as Q-Star): The model, called Q* was able to solve basic maths problems it had not seen before, according to the tech news site the Information. ref [23 Nov 2023]
Sam Altman reveals in an interview with Bill Gates (2 days ago) what's coming up in GPT-4.5 (or GPT-5): Potential integration with other modes of information beyond text, better logic and analysis capabilities, and consistency in performance over the next two years. ref [12 Jan 2024]

Model Spec: Desired behavior for the models in the OpenAI API and ChatGPT ref [8 May 2024] ref: takeaway
AMA (ask me anything) with OpenAI on Reddit [1 Nov 2024]

OpenAI o1

A new series of reasoning models: The complex reasoning-specialized model, OpenAI o1 series, excels in math, coding, and science, outperforming GPT-4o on key benchmarks. [12 Sep 2024] / ref: Awesome LLM Strawberry (OpenAI o1)
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model: 6 types of o1 reasoning patterns (i.e., Systematic Analysis (SA), Method Reuse (MR), Divide and Conquer (DC), Self-Refinement (SR), Context Identification (CI), and Emphasizing Constraints (EC)). the most commonly used reasoning patterns in o1 are DC and SR [17 Oct 2024]

GPT-4 details leaked `unverified`

GPT-4V(ision) system card: ref [25 Sep 2023] / ref
The Dawn of LMMs: [cnt]: Preliminary Explorations with GPT-4V(ision) [29 Sep 2023]
GPT-4 details leaked
- GPT-4 is a language model with approximately 1.8 trillion parameters across 120 layers, 10x larger than GPT-3. It uses a Mixture of Experts (MoE) model with 16 experts, each having about 111 billion parameters. Utilizing MoE allows for more efficient use of resources during inference, needing only about 280 billion parameters and 560 TFLOPs, compared to the 1.8 trillion parameters and 3,700 TFLOPs required for a purely dense model.
- The model is trained on approximately 13 trillion tokens from various sources, including internet data, books, and research papers. To reduce training costs, OpenAI employs tensor and pipeline parallelism, and a large batch size of 60 million. The estimated training cost for GPT-4 is around $63 million. ref [Jul 2023]

OpenAI Products

ChatGPT can now see, hear, and speak: It has recently been updated to support multimodal capabilities, including voice and image. [25 Sep 2023] Whisper / CLIP
ChatGPT Plugin [23 Mar 2023]
ChatGPT Function calling [Jun 2023] > Azure OpenAI supports function calling. ref
Custom instructions: In a nutshell, the Custom Instructions feature is a cross-session memory that allows ChatGPT to retain key instructions across chat sessions. [20 Jul 2023]
GPT-3.5 Turbo Fine-tuning Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. [22 Aug 2023]
Open AI Enterprise: Removes GPT-4 usage caps, and performs up to two times faster ref [28 Aug 2023]
DALL·E 3 : In September 2023, OpenAI announced their latest image model, DALL-E 3 git [Sep 2023]
OpenAI DevDay 2023: GPT-4 Turbo with 128K context, Assistants API (Code interpreter, Retrieval, and function calling), GPTs (Custom versions of ChatGPT: ref), Copyright Shield, Parallel Function Calling, JSON Mode, Reproducible outputs [6 Nov 2023]
Introducing the GPT Store: Roll out the GPT Store to ChatGPT Plus, Team and Enterprise users GPTs [10 Jan 2024]
New embedding models text-embedding-3-small: Embedding size: 512, 1536 text-embedding-3-large: Embedding size: 256,1024,3072 [25 Jan 2024]
Sora Text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. [15 Feb 2024]
ChatGPT Memory: Remembering things you discuss across all chats saves you from having to repeat information and makes future conversations more helpful. [Apr 2024]
CriticGPT: a version of GPT-4 fine-tuned to critique code generated by ChatGPT [27 Jun 2024]
SearchGPT: AI search [25 Jul 2024] > ChatGPT Search [31 Oct 2024]
Structured Outputs in the API: a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers. [6 Aug 2024]
OpenAI DevDay 2024: Real-time API (speech-to-speech), Vision Fine-Tuning, Prompt Caching, and Distillation (fine-tuning a small language model using a large language model). ref [1 Oct 2024]

GPT series release date

GPT 1: Decoder-only model. 117 million parameters. [Jun 2018] git
GPT 2: Increased model size and parameters. 1.5 billion. [14 Feb 2019] git
GPT 3: Introduced few-shot learning. 175B. [11 Jun 2020] git
GPT 3.5: 3 variants each with 1.3B, 6B, and 175B parameters. [15 Mar 2022] Estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096
ChatGPT: GPT-3 fine-tuned with RLHF. 20B or 175B. unverified ref [30 Nov 2022]
GPT 4: Mixture of Experts (MoE). 8 models with 220 billion parameters each, for a total of about 1.76 trillion parameters. unverified ref [14 Mar 2023]
GPT-4o: o stands for Omni. 50% cheaper. 2x faster. Multimodal input and output capabilities (text, audio, vision). supports 50 languages. [13 May 2024] / GPT-4o mini: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. [18 Jul 2024]
OpenAI o1 [12 Sep 2024]

Context constraints

Sparse Attention: Generating Long Sequences with Sparse Transformer:💡Sparse attention computes scores for a subset of pairs, selected via a fixed or learned sparsity pattern, reducing calculation costs. Strided attention: image, audio / Fixed attention:text ref / git [23 Apr 2019]
Rotary Positional Embedding (RoPE):💡[cnt] / ref / doc [20 Apr 2021]
- How is this different from the sinusoidal embeddings used in "Attention is All You Need"?
  1. Sinusoidal embeddings apply to each coordinate individually, while rotary embeddings mix pairs of coordinates
  2. Sinusoidal embeddings add a cos or sin term, while rotary embeddings use a multiplicative factor.
  3. Rotary embeddings are applied to positional encoding to K and V, not to the input embeddings.
Structured Prompting: Scaling In-Context Learning to 1,000 Examples: [cnt] [13 Dec 2022]
1. Microsoft's Structured Prompting allows thousands of examples, by first concatenating examples into groups, then inputting each group into the LM. The hidden key and value vectors of the LM's attention modules are cached. Finally, when the user's unaltered input prompt is passed to the LM, the cached attention vectors are injected into the hidden layers of the LM.
2. This approach wouldn't work with OpenAI's closed models. because this needs to access [keys] and [values] in the transformer internals, which they do not expose. You could implement yourself on OSS ones. cite [07 Feb 2023]
Introducing 100K Context Windows: hundreds of pages, Around 75,000 words; [11 May 2023] demo Anthropic Claude
Lost in the Middle: How Language Models Use Long Contexts:💡[cnt] [6 Jul 2023]
1. Best Performace when relevant information is at beginning
2. Too many retrieved documents will harm performance
3. Performacnce decreases with an increase in context
Ring Attention: [cnt]: 1. Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while overlapping the communication of key-value blocks with the computation of blockwise attention. 2. Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million in length without making approximations to attention. 3. we propose an enhancement to the blockwise parallel transformers (BPT) framework. git [3 Oct 2023]
“Needle in a Haystack” Analysis [21 Nov 2023]: Context Window Benchmarks; Claude 2.1 (200K Context Window) vs GPT-4; Long context prompting for Claude 2.1 adding just one sentence, “Here is the most relevant sentence in the context:”, to the prompt resulted in near complete fidelity throughout Claude 2.1’s 200K context window. [6 Dec 2023]
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. [2 Jan 2024]
Giraffe: Adventures in Expanding Context Lengths in LLMs. A new truncation strategy for modifying the basis for the position encoding. ref [2 Jan 2024]
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism. Integrate attention from both local and global attention. [10 Apr 2024]

Numbers LLM

Open AI Tokenizer: GPT-3, Codex Token counting
tiktoken: BPE tokeniser for use with OpenAI's models. Token counting. [Dec 2022]
What are tokens and how to count them?: OpenAI Articles
5 Approaches To Solve LLM Token Limits : doc [2023]
Byte-Pair Encoding (BPE): P.2015. The most widely used tokenization algorithm for text today. BPE adds an end token to words, splits them into characters, and merges frequent byte pairs iteratively until a stop criterion. The final tokens form the vocabulary for new data encoding and decoding. [31 Aug 2015] / ref [13 Aug 2021]
Tokencost: Token price estimates for 400+ LLMs [Dec 2023]
Numbers every LLM Developer should know [18 May 2023]

Trustworthy, Safe and Secure LLM

NIST AI Risk Management Framework: NIST released the first complete version of the NIST AI RMF Playbook on March 30, 2023
Guardrails Hub: Guardrails for common LLM validation use cases
NeMo Guardrails: Building Trustworthy, Safe and Secure LLM Conversational Systems [Apr 2023]
Political biases of LLMs: [cnt]: From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. [15 May 2023]
Trustworthy LLMs: [cnt]: Comprehensive overview for assessing LLM trustworthiness; Reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. [10 Aug 2023]
Red Teaming: The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. ref
The Foundation Model Transparency Index: [cnt]: A comprehensive assessment of the transparency of foundation model developers ref [19 Oct 2023]
Hallucinations: [cnt]: A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [9 Nov 2023]
Hallucination Leaderboard: Evaluate how often an LLM introduces hallucinations when summarizing a document. [Nov 2023]
FactTune: A procedure that enhances the factuality of LLMs without the need for human feedback. The process involves the fine-tuning of a separated LLM using methods such as DPO and RLAIF, guided by preferences generated by FActScore. [14 Nov 2023] FActScore works by breaking down a generation into a series of atomic facts and then computing the percentage of these atomic facts by a reliable knowledge source.
OpenAI Weak-to-strong generalization:💡In the superalignment problem, humans must supervise models that are much smarter than them. The paper discusses supervising a GPT-4 or 3.5-level model using a GPT-2-level model. It finds that while strong models supervised by weak models can outperform the weak models, they still don’t perform as well as when supervised by ground truth. git [14 Dec 2023]
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models: A compre hensive survey of over thirty-two techniques developed to mitigate hallucination in LLMs [2 Jan 2024]
Anthropic Many-shot jailbreaking: simple long-context attack, Bypassing safety guardrails by bombarding them with unsafe or harmful questions and answers. [3 Apr 2024]
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. The OpenAI highlights the need for instruction privileges in LLMs to prevent attacks and proposes training models to conditionally follow lower-level instructions based on their alignment with higher-level instructions. [19 Apr 2024]
Frontier Safety Framework: Google DeepMind, Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. [17 May 2024]
Mapping the Mind of a Large Language Model: Anthrophic, A technique called "dictionary learning" can help understand model behavior by identifying which features respond to a particular input, thus providing insight into the model's "reasoning." ref [21 May 2024]
Extracting Concepts from GPT-4: Sparse Autoencoders identify key features, enhancing the interpretability of language models like GPT-4. They extract 16 million interpretable features using GPT-4's outputs as input for training. [6 Jun 2024]
AI models collapse when trained on recursively generated data: Model Collapse. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. [24 Jul 2024]
LLMs Will Always Hallucinate, and We Need to Live With This: LLMs cannot completely eliminate hallucinations through architectural improvements, dataset enhancements, or fact-checking mechanisms due to fundamental mathematical and logical limitations. [9 Sep 2024]
Large Language Models Reflect the Ideology of their Creators: When prompted in Chinese, all LLMs favor pro-Chinese figures; Western LLMs similarly align more with Western values, even in English prompts. [24 Oct 2024]

Large Language Model Is: Abilities

Multitask Prompted Training Enables Zero-Shot Task Generalization: [cnt]: A language model trained on various tasks using prompts can learn and generalize to new tasks in a zero-shot manner. [15 Oct 2021]
Emergent Abilities of Large Language Models: [cnt]: Large language models can develop emergent abilities, which are not explicitly trained but appear at scale and are not present in smaller models. . These abilities can be enhanced using few-shot and augmented prompting techniques. ref [15 Jun 2022]
Improving mathematical reasoning with process supervision [31 May 2023]
Math soving optimized LLM WizardMath: [cnt]: Developed by adapting Evol-Instruct and Reinforcement Learning techniques, these models excel in math-related instructions like GSM8k and MATH. git [18 Aug 2023] / Math solving Plugin: Wolfram alpha
Language Modeling Is Compression: [cnt]: Lossless data compression, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%). [19 Sep 2023]
LLMs Represent Space and Time: [cnt]: Large language models learn world models of space and time from text-only training. [3 Oct 2023]
Large Language Models for Software Engineering: [cnt]: Survey and Open Problems, Large Language Models (LLMs) for Software Engineering (SE) applications, such as code generation, testing, repair, and documentation. [5 Oct 2023]
LLMs for Chip Design: Domain-Adapted LLMs for Chip Design [31 Oct 2023]
Design2Code: How Far Are We From Automating Front-End Engineering? 64% of cases GPT-4V generated webpages are considered better than the original reference webpages [5 Mar 2024]
Testing theory of mind in large language models and humans: Some large language models (LLMs) perform as well as, and in some cases better than, humans when presented with tasks designed to test the ability to track people’s mental states, known as “theory of mind.” cite [20 May 2024]
A Survey on Employing Large Language Models for Text-to-SQL Tasks: a comprehensive overview of LLMs in text-to-SQL tasks [21 Jul 2024]
Can LLMs Generate Novel Research Ideas?: A Large-Scale Human Study with 100+ NLP Researchers. We find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas. However, the study revealed a lack of diversity in AI-generated ideas. [6 Sep 2024]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chab.md

chab.md

Section 6 : Large Language Model: Challenges and Solutions

AGI Discussion

OpenAI's Roadmap and Products

OpenAI's roadmap

OpenAI o1

GPT-4 details leaked `unverified`

OpenAI Products

GPT series release date

Context constraints

Numbers LLM

Trustworthy, Safe and Secure LLM

Large Language Model Is: Abilities

Files

chab.md

Latest commit

History

chab.md

File metadata and controls

Section 6 : Large Language Model: Challenges and Solutions

AGI Discussion

OpenAI's Roadmap and Products

OpenAI's roadmap

OpenAI o1

GPT-4 details leaked unverified

OpenAI Products

GPT series release date

Context constraints

Numbers LLM

Trustworthy, Safe and Secure LLM

Large Language Model Is: Abilities

GPT-4 details leaked `unverified`