From db40b95c48ad935a9c118bcc5b0c5422144a2696 Mon Sep 17 00:00:00 2001 From: Courtland Leer Date: Wed, 18 Dec 2024 12:19:29 -0500 Subject: [PATCH] prob final draft --- ...ndow size doesn't solve personalization.md | 14 +++++++ content/notes/Honcho name lore.md | 25 ++++++++++++ ...igm hamstrings the space of possibility.md | 23 +++++++++++ content/notes/Humans like personalization.md | 40 +++++++++++++++++++ ...acognition is inference about inference.md | 13 ++++++ ...cel at theory of mind because they read.md | 21 ++++++++++ ...perior to verbatim response predictions.md | 40 +++++++++++++++++++ ...learning is fixated on task performance.md | 12 ++++++ ...able space of user identity is enormous.md | 17 ++++++++ content/notes/YouSim Disclaimers.md | 20 ++++++++++ 10 files changed, 225 insertions(+) create mode 100644 content/notes/Context window size doesn't solve personalization.md create mode 100644 content/notes/Honcho name lore.md create mode 100644 content/notes/Human-AI chat paradigm hamstrings the space of possibility.md create mode 100644 content/notes/Humans like personalization.md create mode 100644 content/notes/LLM Metacognition is inference about inference.md create mode 100644 content/notes/LLMs excel at theory of mind because they read.md create mode 100644 content/notes/Loose theory of mind imputations are superior to verbatim response predictions.md create mode 100644 content/notes/Machine learning is fixated on task performance.md create mode 100644 content/notes/The model-able space of user identity is enormous.md create mode 100644 content/notes/YouSim Disclaimers.md diff --git a/content/notes/Context window size doesn't solve personalization.md b/content/notes/Context window size doesn't solve personalization.md new file mode 100644 index 0000000000000..850d9167764fd --- /dev/null +++ b/content/notes/Context window size doesn't solve personalization.md @@ -0,0 +1,14 @@ +--- +title: Context window size doesn't solve personalization +date: 05.11.24 +tags: + - notes + - ml +--- +There are two reasons that ever increasing and even functionally infinite context windows won't by default solve personalization for AI apps/agents: + +1. **Personal context has to come from somewhere.** Namely, from your head--off your wetware. So we need mechanisms to transfer that data from the human to the model. And there's *[[The model-able space of user identity is enormous|a lot of it]]*. At [Plastic](https://plasticlabs.ai) we think the path here is mimicking human social cognition, which is why we built [Honcho](https://honcho.dev)--to ambiently model users, the generate personal context for agents on demand. + +2. **If everything is important, nothing is important**. Even if the right context is stuffed in a crammed context window somewhere, the model still needs mechanisms to discern what's valuable and important for generation. What should it pay attention to? What weight should it give different pieces of context in any given moment? Again humans do this almost automatically, so mimicking what we know about those processes can give the model critical powers of on-demand discernment. Even what might start to look to us like intuition, taste, or vibes. + +All that said, better and bigger context window are incredibly useful. We just need to build the appropriate supporting systems to leverage their full potential. \ No newline at end of file diff --git a/content/notes/Honcho name lore.md b/content/notes/Honcho name lore.md new file mode 100644 index 0000000000000..0d8154531b3ea --- /dev/null +++ b/content/notes/Honcho name lore.md @@ -0,0 +1,25 @@ +--- +title: Honcho name lore +date: 01.26.24 +--- + +Earlier this year [Courtland](https://x.com/courtlandleer) was reading _Rainbows End_, [Vernor Vinge's](https://en.wikipedia.org/wiki/Vernor_Vinge) [seminal augmented reality novel](), when he came across the term "Local Honcho[^1]": + +> We simply put our own agent nearby, in a well-planned position with essentially zero latencies. What the Americans call a Local Honcho. + +The near future Vinge constructs is one of outrageous data abundance, where every experience is riddled with information and overlayed realities, and each person must maintain multiple identities against this data and relative to those contexts. + +It's such an intense landscape, that the entire educational system has undergone wholesale renovation to address the new normal, and older people must routinely return to school to learn the latest skills. It also complicates economic life, resulting in intricate networks of nested agents than can be hard for any one individual to tease apart. + +Highlighting this, a major narrative arc in the novel involves intelligence agencies running operations of pretty unfathomable global sophistication. Since (in the world of the novel) artificial intelligence has more or less failed as a research direction, this requires ultra-competent human operators able to parse and leverage high velocity information. For field operations, it requires a "Local Honcho" on the ground to act as an adaptable central nervous system for the mission and its agents: + +> Altogether it was not as secure as Vaz’s milnet, but it would suffice for most regions of the contingency tree. Alfred tweaked the box, and now he was getting Parker’s video direct. At last, he was truly a Local Honcho. + +For months before, Plastic had been deep into the weeds around harvesting, retrieving, & leveraging user context with LLMs. First to enhance the UX of our AI tutor (Bloom), then in thinking about how to solve this horizontally for all vertical-specific AI applications. It struck us that we faced similar challenges to the characters in _Rainbows End_ and were converging on a similar solution. + +As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho. + +But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation. + +[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho)) + diff --git a/content/notes/Human-AI chat paradigm hamstrings the space of possibility.md b/content/notes/Human-AI chat paradigm hamstrings the space of possibility.md new file mode 100644 index 0000000000000..28a8a1158221c --- /dev/null +++ b/content/notes/Human-AI chat paradigm hamstrings the space of possibility.md @@ -0,0 +1,23 @@ +--- +title: Human-AI chat paradigm hamstrings the space of possibility +date: 02.21.24 +--- + +The human-AI chat paradigm assumes only two participants in a given interaction. While this is sufficient for conversations directly with un-augmented foundation models, it creates many obstacles when designing more sophisticated cognitive architectures. When you train/fine-tune a language model, you begin to reinforce token distributions that are appropriate to come in between the special tokens denoting human vs AI messages. + +Here's a limited list of things _besides_ a direct response we routinely want to generate: + +- A 'thought' about how to respond to the user +- A [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind prediction]] about the user's internal mental state +- A list of ways to improve prediction +- A list of items to search over storage +- A 'plan' for how to approach a problem +- A mock user response +- A [[LLM Metacognition is inference about inference|metacognitive step]] to consider the product of prior inference + +In contrast, the current state of inference is akin to immediately blurting out the first thing that comes into your mind--something that humans with practiced aptitude in social cognition rarely do. But this is very hard given the fact that those types of responses don't ever come after the special AI message token. Not very flexible. + +We're already anecdotally seeing well-trained completion models follow instructions impressively likely because of incorporation into pretraining. Is chat the next thing to be subsumed by general completion models? Because if so, flexibility in the types of inferences you can make would be very beneficial. + +Metacognition then becomes something you can do at any step in a conversation. Same with instruction following & chat. Maybe this helps push LLMs in a much more general direction. + diff --git a/content/notes/Humans like personalization.md b/content/notes/Humans like personalization.md new file mode 100644 index 0000000000000..56e4683233d5c --- /dev/null +++ b/content/notes/Humans like personalization.md @@ -0,0 +1,40 @@ +--- +title: Humans like personalization +date: 03.26.24 +--- + +To us: it's obvious. But we get asked this a lot: + +> Why do I need to personalize my AI application? + +Fair question; not everyone has gone down this conceptual rabbithole to the extent we have at [Plastic](https://plasticlabs.ai) and with [Honcho](https://honcho.dev). + +Short answer: people like it. + +In the tech bubble, it can be easy to forget about what _most_ humans like. Isn't building stuff people love our job though? + +In web2, it's taken for granted. Recommender algorithms make UX really sticky, which retains users sufficiently long to monetize them. To make products people love and scale them, they had to consider whether _billions_--in aggregate--tend to prefer personalized products/experiences or not. + +In physical reality too, most of us prefer white glove professional services, bespoke products, and friends and family who know us _deeply_. We place a premium in terms of time and economic value on those goods and experiences. + +The more we're missing that, the more we're typically in a principal-agent problem, which creates overhead, interest misalignment, dissatisfaction, mistrust, and information asymmetry: + +--- + + + +--- + +But, right now, most AI applications are just toys and demos: + +![[Honcho; User Context Management for LLM Apps#^18066b]] + +It's also why everyone is obsessed with evals and benchmarks that have scant practical utility in terms of improving UX for the end user. If we had more examples of good products, ones people loved, killer apps, no one would care about leaderboards anymore. + +> OK, but what about services that are purely transactional? Why would a user want that to be personalized? Why complicate it? Just give me the answer, complete the task, etc... + +Two answers: + +1. Every interaction has context. Like it or not, people have preferences and the more an app/agent can align with those, the more it can enhance time to value for the user. It can be sticker, more delightful, "just work," and entail less overhead. (We're building more than calculators here, though this applies even to those!) +2. If an app doesn't do this, it'll get out-competed by one that does...or by the ever improving set of generally capable foundation models. + diff --git a/content/notes/LLM Metacognition is inference about inference.md b/content/notes/LLM Metacognition is inference about inference.md new file mode 100644 index 0000000000000..ab561a407dfe0 --- /dev/null +++ b/content/notes/LLM Metacognition is inference about inference.md @@ -0,0 +1,13 @@ +--- +title: LLM Metacognition is inference about inference +date: 03.26.24 +--- + +For wetware, metacognition is typically defined as ‘thinking about thinking’ or often a catch-all for any ‘higher-level’ cognition. + +(In some more specific domains, it's an introspective process, focused on thinking about exclusively _your own_ thinking or a suite of personal learning strategies...all valid within their purview, but too constrained for our purposes.) + +In large language models, the synthetic corollary of cognition is inference. So we can reasonably define a metacognitive process in an LLM architecture as any that runs inference on the output of prior inference. That is, inference itself is used as context--_inference about inference_. + +It might be instantly injected into the next prompt, stored for later use, or leveraged by another model. This kind of architecture is critical when dealing with user context, since LLMs can run inference about user behavior, then use that synthetic context in the future. Experiments here will be critical to overcome [[Machine learning is fixated on task performance|the machine learning community's fixation on task completion]]. For us at Plastic, one of the most interesting species of metacogntion is [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind and mimicking that in LLMs]] to form high-fidelity representations of users. + diff --git a/content/notes/LLMs excel at theory of mind because they read.md b/content/notes/LLMs excel at theory of mind because they read.md new file mode 100644 index 0000000000000..afd079d58b362 --- /dev/null +++ b/content/notes/LLMs excel at theory of mind because they read.md @@ -0,0 +1,21 @@ +--- +title: LLMs excel at theory of mind because they read +date: 02.20.24 +--- + +Large language models are [simulators](https://generative.ink/posts/simulators/). In predicting the next likely token, they are simulating how an abstracted “_any person”_ might continue the generation. The basis for this simulation is the aggregate compression of a massive corpus of human generated natural language from the internet. So, predicting humans is _literally_ their core function. + +In that corpus is our literature, our philosophy, our social media, our hard and social science--the knowledge graph of humanity, both in terms of discrete facts and messy human interaction. That last bit is important. The latent space of an LLM's pretraining is in large part a _narrative_ space. Narration chock full of humans reasoning about other humans--predicting what they will do next, what they might be thinking, how they might be feeling. + +That's no surprise; we're a social species with robust social cognition. It's also no surprise[^1] that grokking that interpersonal narrative space in its entirety would make LLMs adept at [[Loose theory of mind imputations are superior to verbatim response predictions|generation resembling social cognition too]].[^2] + +We know that in humans, we can strongly [correlate reading with improved theory of mind abilities](https://journal.psych.ac.cn/xlkxjz/EN/10.3724/SP.J.1042.2022.00065). When your neural network is consistently exposed to content about how other people think, feel, desire, believe, prefer, those mental tasks are reinforced. The more experience you have with a set of ideas or states, the more adept you become. + +The experience of such natural language narration _is itself a simulation_ where you practice and hone your theory of mind abilities. Even if, say, your English or Psychology teacher was foisting the text on you with other training intentions. Or even if you ran the simulation without coercion to escape at the beach. + +It's not such a stretch to imagine that in optimizing for other tasks LLMs acquire emergent abilities not intentionally trained.[^3] It may even be that in order to learn natural language prediction, these systems need theory of mind abilities or that learning language specifically involves them--that's certainly the case with human wetware systems and theory of mind skills do seem to improve with model size and language generation efficacy. + +[^1]: Kosinski includes a compelling treatment of much of this in ["Evaluating Large Language Models in Theory of Mind Tasks"](https://arxiv.org/abs/2302.02083) +[^2]: It also leads to other wacky phenomena like the [Waluigi effect](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post#The_Waluigi_Effect) +[^3]: Here's Chalmers [making a very similar point](https://youtube.com/clip/UgkxliSZFnnZHvYf2WHM4o1DN_v4kW6LsiOU?feature=shared) + diff --git a/content/notes/Loose theory of mind imputations are superior to verbatim response predictions.md b/content/notes/Loose theory of mind imputations are superior to verbatim response predictions.md new file mode 100644 index 0000000000000..63aa81319fb7a --- /dev/null +++ b/content/notes/Loose theory of mind imputations are superior to verbatim response predictions.md @@ -0,0 +1,40 @@ +--- +title: Loose theory of mind imputations are superior to verbatim response predictions +date: 02.20.24 +--- + +When we [[Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them. + +Since we were operating in a conversational chat paradigm, our first instinct was to try and predict what the user would say next. Two things were immediately apparent: (1) this was really hard, & (2) response predictions weren't very useful. + +We saw some remarkable exceptions, but _reliable_ verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context gathering wearables, BMIs, or the network of context sharing apps we're building with [[Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet. + +Being good at what any person in general might plausibly say is literally what LLMs do. But being perfect at what one individual will say in a singular specific setting is a whole different story. Even lifelong human partners might only experience this a few times a week. + +Plus, even when you get it right, what exactly are you supposed to do with it? The fact that's such a narrow reasoning product limits the utility you're able to get out of a single inference. + +So what are models good at predicting that's useful with limited context and local to a single turn of conversation? Well, it turns out they're really good at [imputing internal mental states](https://arxiv.org/abs/2302.02083). That is, they're good at theory of mind predictions--thinking about what you're thinking. A distinctly _[[LLM Metacognition is inference about inference|metacognitive]]_ task. + +(Why are they good at this? [[LLMs excel at theory of mind because they read|We're glad you asked]].) + +Besides just being better at it, letting the model leverage what it knows to make open-ended theory of mind imputation has several distinct advantages over verbatim response prediction: + +1. **Fault tolerance** + + - Theory of mind predictions are often replete with assessments of emotion, desire, belief, value, aesthetic, preference, knowledge, etc. That means they seek to capture a range within a distribution. A slice of user identity. + - This is much richer than trying (& likely failing) to generate a single point estimate (like in verbatim prediction) and includes more variance. Therefore there's a higher probability you identify something useful by trusting the model to flex its emergent strengths. + +2. **Learning** ^555815 + + - That high variance means there's more to be wrong (& right) about. More content = more claims, which means more opportunity to learn. + - Being wrong here is a feature, not a bug; comparing those prediction errors with reality are how you know what you need to understand about the user in the future to get to ground truth. + +3. **Interpretability** + + - Knowing what you're right and wrong about exposes more surface area against which to test and understand the efficacy of the model--i.e. how well it knows the user. + - As we're grounded in the user and theory of mind, we're better able to assess this than if we're simply asking for likely human responses in the massive space of language encountered in training. + +4. **Actionability** + - The richness of theory of mind predictions give us more to work with _right now_. We can funnel these insights into further inference steps to create UX in better alignment and coherence with user state. + - Humans make thousands of tiny, subconscious interventions resposive to as many sensory cues & theory of mind predictions all to optimize single social interactions. It pays to know about the internal state of others. + - Though our lifelong partners from above can't perfectly predict each other's sentences, they can impute each other's state with extremely high-fidelity. The rich context they have on one another translates to a desire to spend most of their time together (good UX). diff --git a/content/notes/Machine learning is fixated on task performance.md b/content/notes/Machine learning is fixated on task performance.md new file mode 100644 index 0000000000000..d2d58169af7a1 --- /dev/null +++ b/content/notes/Machine learning is fixated on task performance.md @@ -0,0 +1,12 @@ +--- +title: Machine learning is fixated on task performance +date: 12.12.23 +--- + +The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on. + +However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis. + +Take summarization. It’s a popular machine learning task at which models have become quite proficient...at least from a benchmark perspective. However, when models summarize for users with a pulse, they fall short. The reason is simple: the models don’t know this individual. The key takeaways for a specific user differ dramatically from the takeaways _any possible_ internet user _would probably_ note. ^0005ac + +So a shift in focus toward user-specific task performance would provide a much more dynamic & realistic approach. Catering to individual needs & paving the way for more personalized & effective ML applications. diff --git a/content/notes/The model-able space of user identity is enormous.md b/content/notes/The model-able space of user identity is enormous.md new file mode 100644 index 0000000000000..964fb064cda11 --- /dev/null +++ b/content/notes/The model-able space of user identity is enormous.md @@ -0,0 +1,17 @@ +--- +title: There's an enormous space of user identity to model +date: 05.11.24 +tags: + - notes + - ml + - cogsci +--- +While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[User State is State of the Art|vaster than most imagine]]. + +Contrast recommender algorithms (which are impressive!) needing mountains of activity data to back into a single preference with [the human connectome](https://www.science.org/doi/10.1126/science.adk4858) containing 1400 TB of compressed representation in one cubic millimeter. + +LLMs give us access to a new class of this data going beyond tracking the behavioral, [[LLMs excel at theory of mind because they read|toward the semantic]]. They can distill and grok much 'softer' physiological elements, allowing insight into complex mental states like value, belief, intention, aesthetic, desire, history, knowledge, etc. + +There's so much to do here though, that plug-in-your docs/email/activity schemes, user surveys are laughably limited in scope. We need ambient methods running social cognition, like [Honcho](https://honcho.dev). + +As we asymptotically approach a fuller accounting of individual identity, we can unlock more positive sum application/agent experiences, richer than the exploitation of base desire we're used to. \ No newline at end of file diff --git a/content/notes/YouSim Disclaimers.md b/content/notes/YouSim Disclaimers.md new file mode 100644 index 0000000000000..f1f72a82cc1c2 --- /dev/null +++ b/content/notes/YouSim Disclaimers.md @@ -0,0 +1,20 @@ +--- +title: YouSim Disclaimers +tags: + - yousim + - legal +date: 11.11.24 +--- + +Plastic Labs is the creator of [YouSim.ai](https://yousim.ai), an AI product demo that has inspired the anonymous creation of the \$YOUSIM token using Pump.fun on the Solana blockchain, among many other tokens. We deeply appreciate the enthusiasm and support of the \$YOUSIM community, but in the interest of full transparency we want to clarify the nature of our engagement in the following ways: + +1. Plastic Labs did not issue, nor does it control, or provide financial advice related to the \$YOUSIM memecoin. The memecoin project is led by an independent community and has undergone a community takeover (CTO). +2. Plastic Labs' acceptance of \$YOUSIM tokens for research grants does not constitute an endorsement of the memecoin as an investment. These grants support our broader mission of advancing AI research and innovation, especially within the open source community. +3. YouSim.ai and any other Plastic Labs products remain separate from the \$YOUSIM memecoin. Any future integration of token utility into our products would be carefully considered and subject to regulatory compliance. +4. The \$YOUSIM memecoin carries inherent risks, including price volatility, potential ecosystem scams, and regulatory uncertainties. Plastic Labs is not responsible for any financial losses or damages incurred through engagement with the memecoin. +5. Plastic Labs will never direct message any member of the $YOUSIM community soliciting tokens, private keys, seed phrases, or any other private information, collectors items, or financial instruments. +6. YouSim.ai and the products it powers are simulated environments and their imaginary outputs do not reflect the viewpoints, positions, voice, or agenda of Plastic Labs. +7. Communications from Plastic Labs regarding the \$YOUSIM memecoin are for informational purposes only and do not constitute financial, legal, or tax advice. Users should conduct their own research and consult with professional advisors before making any decisions. +8. Plastic Labs reserves the right to adapt our engagement with the \$YOUSIM community as regulatory landscapes evolve and to prioritize the integrity of our products and compliance with applicable laws. + +We appreciate the \$YOUSIM community's support and passion for YouSim.ai and the broader potential of AI technologies. However, it's crucial for us to maintain transparency about the boundaries of our engagement. We encourage responsible participation and ongoing open dialogue as we collectively navigate this exciting and rapidly evolving space.