forked from jackyzha0/quartz
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Courtland Leer
committed
Dec 18, 2024
1 parent
2459f43
commit db40b95
Showing
10 changed files
with
225 additions
and
0 deletions.
There are no files selected for viewing
14 changes: 14 additions & 0 deletions
14
content/notes/Context window size doesn't solve personalization.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
--- | ||
title: Context window size doesn't solve personalization | ||
date: 05.11.24 | ||
tags: | ||
- notes | ||
- ml | ||
--- | ||
There are two reasons that ever increasing and even functionally infinite context windows won't by default solve personalization for AI apps/agents: | ||
|
||
1. **Personal context has to come from somewhere.** Namely, from your head--off your wetware. So we need mechanisms to transfer that data from the human to the model. And there's *[[The model-able space of user identity is enormous|a lot of it]]*. At [Plastic](https://plasticlabs.ai) we think the path here is mimicking human social cognition, which is why we built [Honcho](https://honcho.dev)--to ambiently model users, the generate personal context for agents on demand. | ||
|
||
2. **If everything is important, nothing is important**. Even if the right context is stuffed in a crammed context window somewhere, the model still needs mechanisms to discern what's valuable and important for generation. What should it pay attention to? What weight should it give different pieces of context in any given moment? Again humans do this almost automatically, so mimicking what we know about those processes can give the model critical powers of on-demand discernment. Even what might start to look to us like intuition, taste, or vibes. | ||
|
||
All that said, better and bigger context window are incredibly useful. We just need to build the appropriate supporting systems to leverage their full potential. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
title: Honcho name lore | ||
date: 01.26.24 | ||
--- | ||
|
||
Earlier this year [Courtland](https://x.com/courtlandleer) was reading _Rainbows End_, [Vernor Vinge's](https://en.wikipedia.org/wiki/Vernor_Vinge) [seminal augmented reality novel](<https://en.wikipedia.org/wiki/Rainbows_End_(novel)>), when he came across the term "Local Honcho[^1]": | ||
|
||
> We simply put our own agent nearby, in a well-planned position with essentially zero latencies. What the Americans call a Local Honcho. | ||
The near future Vinge constructs is one of outrageous data abundance, where every experience is riddled with information and overlayed realities, and each person must maintain multiple identities against this data and relative to those contexts. | ||
|
||
It's such an intense landscape, that the entire educational system has undergone wholesale renovation to address the new normal, and older people must routinely return to school to learn the latest skills. It also complicates economic life, resulting in intricate networks of nested agents than can be hard for any one individual to tease apart. | ||
|
||
Highlighting this, a major narrative arc in the novel involves intelligence agencies running operations of pretty unfathomable global sophistication. Since (in the world of the novel) artificial intelligence has more or less failed as a research direction, this requires ultra-competent human operators able to parse and leverage high velocity information. For field operations, it requires a "Local Honcho" on the ground to act as an adaptable central nervous system for the mission and its agents: | ||
|
||
> Altogether it was not as secure as Vaz’s milnet, but it would suffice for most regions of the contingency tree. Alfred tweaked the box, and now he was getting Parker’s video direct. At last, he was truly a Local Honcho. | ||
For months before, Plastic had been deep into the weeds around harvesting, retrieving, & leveraging user context with LLMs. First to enhance the UX of our AI tutor (Bloom), then in thinking about how to solve this horizontally for all vertical-specific AI applications. It struck us that we faced similar challenges to the characters in _Rainbows End_ and were converging on a similar solution. | ||
|
||
As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho. | ||
|
||
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation. | ||
|
||
[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho)) | ||
|
23 changes: 23 additions & 0 deletions
23
content/notes/Human-AI chat paradigm hamstrings the space of possibility.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
title: Human-AI chat paradigm hamstrings the space of possibility | ||
date: 02.21.24 | ||
--- | ||
|
||
The human-AI chat paradigm assumes only two participants in a given interaction. While this is sufficient for conversations directly with un-augmented foundation models, it creates many obstacles when designing more sophisticated cognitive architectures. When you train/fine-tune a language model, you begin to reinforce token distributions that are appropriate to come in between the special tokens denoting human vs AI messages. | ||
|
||
Here's a limited list of things _besides_ a direct response we routinely want to generate: | ||
|
||
- A 'thought' about how to respond to the user | ||
- A [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind prediction]] about the user's internal mental state | ||
- A list of ways to improve prediction | ||
- A list of items to search over storage | ||
- A 'plan' for how to approach a problem | ||
- A mock user response | ||
- A [[LLM Metacognition is inference about inference|metacognitive step]] to consider the product of prior inference | ||
|
||
In contrast, the current state of inference is akin to immediately blurting out the first thing that comes into your mind--something that humans with practiced aptitude in social cognition rarely do. But this is very hard given the fact that those types of responses don't ever come after the special AI message token. Not very flexible. | ||
|
||
We're already anecdotally seeing well-trained completion models follow instructions impressively likely because of incorporation into pretraining. Is chat the next thing to be subsumed by general completion models? Because if so, flexibility in the types of inferences you can make would be very beneficial. | ||
|
||
Metacognition then becomes something you can do at any step in a conversation. Same with instruction following & chat. Maybe this helps push LLMs in a much more general direction. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
title: Humans like personalization | ||
date: 03.26.24 | ||
--- | ||
|
||
To us: it's obvious. But we get asked this a lot: | ||
|
||
> Why do I need to personalize my AI application? | ||
Fair question; not everyone has gone down this conceptual rabbithole to the extent we have at [Plastic](https://plasticlabs.ai) and with [Honcho](https://honcho.dev). | ||
|
||
Short answer: people like it. | ||
|
||
In the tech bubble, it can be easy to forget about what _most_ humans like. Isn't building stuff people love our job though? | ||
|
||
In web2, it's taken for granted. Recommender algorithms make UX really sticky, which retains users sufficiently long to monetize them. To make products people love and scale them, they had to consider whether _billions_--in aggregate--tend to prefer personalized products/experiences or not. | ||
|
||
In physical reality too, most of us prefer white glove professional services, bespoke products, and friends and family who know us _deeply_. We place a premium in terms of time and economic value on those goods and experiences. | ||
|
||
The more we're missing that, the more we're typically in a principal-agent problem, which creates overhead, interest misalignment, dissatisfaction, mistrust, and information asymmetry: | ||
|
||
--- | ||
|
||
<iframe src="https://player.vimeo.com/video/868985592?h=deff771ffe&color=F6F5F2&title=0&byline=0&portrait=0" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe> | ||
|
||
--- | ||
|
||
But, right now, most AI applications are just toys and demos: | ||
|
||
![[Honcho; User Context Management for LLM Apps#^18066b]] | ||
|
||
It's also why everyone is obsessed with evals and benchmarks that have scant practical utility in terms of improving UX for the end user. If we had more examples of good products, ones people loved, killer apps, no one would care about leaderboards anymore. | ||
|
||
> OK, but what about services that are purely transactional? Why would a user want that to be personalized? Why complicate it? Just give me the answer, complete the task, etc... | ||
Two answers: | ||
|
||
1. Every interaction has context. Like it or not, people have preferences and the more an app/agent can align with those, the more it can enhance time to value for the user. It can be sticker, more delightful, "just work," and entail less overhead. (We're building more than calculators here, though this applies even to those!) | ||
2. If an app doesn't do this, it'll get out-competed by one that does...or by the ever improving set of generally capable foundation models. | ||
|
13 changes: 13 additions & 0 deletions
13
content/notes/LLM Metacognition is inference about inference.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
title: LLM Metacognition is inference about inference | ||
date: 03.26.24 | ||
--- | ||
|
||
For wetware, metacognition is typically defined as ‘thinking about thinking’ or often a catch-all for any ‘higher-level’ cognition. | ||
|
||
(In some more specific domains, it's an introspective process, focused on thinking about exclusively _your own_ thinking or a suite of personal learning strategies...all valid within their purview, but too constrained for our purposes.) | ||
|
||
In large language models, the synthetic corollary of cognition is inference. So we can reasonably define a metacognitive process in an LLM architecture as any that runs inference on the output of prior inference. That is, inference itself is used as context--_inference about inference_. | ||
|
||
It might be instantly injected into the next prompt, stored for later use, or leveraged by another model. This kind of architecture is critical when dealing with user context, since LLMs can run inference about user behavior, then use that synthetic context in the future. Experiments here will be critical to overcome [[Machine learning is fixated on task performance|the machine learning community's fixation on task completion]]. For us at Plastic, one of the most interesting species of metacogntion is [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind and mimicking that in LLMs]] to form high-fidelity representations of users. | ||
|
21 changes: 21 additions & 0 deletions
21
content/notes/LLMs excel at theory of mind because they read.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
title: LLMs excel at theory of mind because they read | ||
date: 02.20.24 | ||
--- | ||
|
||
Large language models are [simulators](https://generative.ink/posts/simulators/). In predicting the next likely token, they are simulating how an abstracted “_any person”_ might continue the generation. The basis for this simulation is the aggregate compression of a massive corpus of human generated natural language from the internet. So, predicting humans is _literally_ their core function. | ||
|
||
In that corpus is our literature, our philosophy, our social media, our hard and social science--the knowledge graph of humanity, both in terms of discrete facts and messy human interaction. That last bit is important. The latent space of an LLM's pretraining is in large part a _narrative_ space. Narration chock full of humans reasoning about other humans--predicting what they will do next, what they might be thinking, how they might be feeling. | ||
|
||
That's no surprise; we're a social species with robust social cognition. It's also no surprise[^1] that grokking that interpersonal narrative space in its entirety would make LLMs adept at [[Loose theory of mind imputations are superior to verbatim response predictions|generation resembling social cognition too]].[^2] | ||
|
||
We know that in humans, we can strongly [correlate reading with improved theory of mind abilities](https://journal.psych.ac.cn/xlkxjz/EN/10.3724/SP.J.1042.2022.00065). When your neural network is consistently exposed to content about how other people think, feel, desire, believe, prefer, those mental tasks are reinforced. The more experience you have with a set of ideas or states, the more adept you become. | ||
|
||
The experience of such natural language narration _is itself a simulation_ where you practice and hone your theory of mind abilities. Even if, say, your English or Psychology teacher was foisting the text on you with other training intentions. Or even if you ran the simulation without coercion to escape at the beach. | ||
|
||
It's not such a stretch to imagine that in optimizing for other tasks LLMs acquire emergent abilities not intentionally trained.[^3] It may even be that in order to learn natural language prediction, these systems need theory of mind abilities or that learning language specifically involves them--that's certainly the case with human wetware systems and theory of mind skills do seem to improve with model size and language generation efficacy. | ||
|
||
[^1]: Kosinski includes a compelling treatment of much of this in ["Evaluating Large Language Models in Theory of Mind Tasks"](https://arxiv.org/abs/2302.02083) | ||
[^2]: It also leads to other wacky phenomena like the [Waluigi effect](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post#The_Waluigi_Effect) | ||
[^3]: Here's Chalmers [making a very similar point](https://youtube.com/clip/UgkxliSZFnnZHvYf2WHM4o1DN_v4kW6LsiOU?feature=shared) | ||
|
40 changes: 40 additions & 0 deletions
40
...ose theory of mind imputations are superior to verbatim response predictions.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
title: Loose theory of mind imputations are superior to verbatim response predictions | ||
date: 02.20.24 | ||
--- | ||
|
||
When we [[Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them. | ||
|
||
Since we were operating in a conversational chat paradigm, our first instinct was to try and predict what the user would say next. Two things were immediately apparent: (1) this was really hard, & (2) response predictions weren't very useful. | ||
|
||
We saw some remarkable exceptions, but _reliable_ verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context gathering wearables, BMIs, or the network of context sharing apps we're building with [[Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet. | ||
|
||
Being good at what any person in general might plausibly say is literally what LLMs do. But being perfect at what one individual will say in a singular specific setting is a whole different story. Even lifelong human partners might only experience this a few times a week. | ||
|
||
Plus, even when you get it right, what exactly are you supposed to do with it? The fact that's such a narrow reasoning product limits the utility you're able to get out of a single inference. | ||
|
||
So what are models good at predicting that's useful with limited context and local to a single turn of conversation? Well, it turns out they're really good at [imputing internal mental states](https://arxiv.org/abs/2302.02083). That is, they're good at theory of mind predictions--thinking about what you're thinking. A distinctly _[[LLM Metacognition is inference about inference|metacognitive]]_ task. | ||
|
||
(Why are they good at this? [[LLMs excel at theory of mind because they read|We're glad you asked]].) | ||
|
||
Besides just being better at it, letting the model leverage what it knows to make open-ended theory of mind imputation has several distinct advantages over verbatim response prediction: | ||
|
||
1. **Fault tolerance** | ||
|
||
- Theory of mind predictions are often replete with assessments of emotion, desire, belief, value, aesthetic, preference, knowledge, etc. That means they seek to capture a range within a distribution. A slice of user identity. | ||
- This is much richer than trying (& likely failing) to generate a single point estimate (like in verbatim prediction) and includes more variance. Therefore there's a higher probability you identify something useful by trusting the model to flex its emergent strengths. | ||
|
||
2. **Learning** ^555815 | ||
|
||
- That high variance means there's more to be wrong (& right) about. More content = more claims, which means more opportunity to learn. | ||
- Being wrong here is a feature, not a bug; comparing those prediction errors with reality are how you know what you need to understand about the user in the future to get to ground truth. | ||
|
||
3. **Interpretability** | ||
|
||
- Knowing what you're right and wrong about exposes more surface area against which to test and understand the efficacy of the model--i.e. how well it knows the user. | ||
- As we're grounded in the user and theory of mind, we're better able to assess this than if we're simply asking for likely human responses in the massive space of language encountered in training. | ||
|
||
4. **Actionability** | ||
- The richness of theory of mind predictions give us more to work with _right now_. We can funnel these insights into further inference steps to create UX in better alignment and coherence with user state. | ||
- Humans make thousands of tiny, subconscious interventions resposive to as many sensory cues & theory of mind predictions all to optimize single social interactions. It pays to know about the internal state of others. | ||
- Though our lifelong partners from above can't perfectly predict each other's sentences, they can impute each other's state with extremely high-fidelity. The rich context they have on one another translates to a desire to spend most of their time together (good UX). |
12 changes: 12 additions & 0 deletions
12
content/notes/Machine learning is fixated on task performance.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
title: Machine learning is fixated on task performance | ||
date: 12.12.23 | ||
--- | ||
|
||
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on. | ||
|
||
However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis. | ||
|
||
Take summarization. It’s a popular machine learning task at which models have become quite proficient...at least from a benchmark perspective. However, when models summarize for users with a pulse, they fall short. The reason is simple: the models don’t know this individual. The key takeaways for a specific user differ dramatically from the takeaways _any possible_ internet user _would probably_ note. ^0005ac | ||
|
||
So a shift in focus toward user-specific task performance would provide a much more dynamic & realistic approach. Catering to individual needs & paving the way for more personalized & effective ML applications. |
Oops, something went wrong.