From d0f04f4981f0b29c738dec6251cbe03ca52ae49e Mon Sep 17 00:00:00 2001 From: Courtland Leer <93223786+courtlandleer@users.noreply.github.com> Date: Fri, 26 Jan 2024 13:11:11 -0500 Subject: [PATCH] whole blog proofread --- content/_index.md | 13 ++++--- ...o; User Context Management for LLM Apps.md | 35 +++++++++---------- content/blog/Open-Sourcing Tutor-GPT.md | 25 +++++++------ .../blog/Theory-of-Mind Is All You Need.md | 17 ++++----- content/extrusions/Extrusion 01.24.md | 12 +++---- content/notes/Honcho name lore.md | 8 ++--- ...on in LLMs is inference about inference.md | 4 ++- ...too focused on general task performance.md | 4 +-- 8 files changed, 59 insertions(+), 59 deletions(-) diff --git a/content/_index.md b/content/_index.md index c9a97943a63ed..0e469b51b109c 100644 --- a/content/_index.md +++ b/content/_index.md @@ -2,18 +2,21 @@ title: Home enableToc: false --- +Welcome to our collaborative second brain. -Welcome to our collaborative second brain. Here you'll find our blog posts, "Extrusions" newsletter, evergreen notes, and public research. And if you like, [you can engage with the ideas directly](https://github.com/plastic-labs/blog) on GitHub. +Here you'll find our blog posts, "Extrusions" newsletter, evergreen notes, and public research. And if you like, [you can engage with the ideas directly](https://github.com/plastic-labs/blog) on GitHub. -Plastic Labs is a research-driven company working at the intersection of human and machine learning. Our current project is [Honcho](https://github.com/plastic-labs/honcho), a user context management solution for AI-powered applications. We believe that by re-centering LLM app development around the user we can unlock a rich landscape of deeply personalized, autonomous agents. +Plastic Labs is a research-driven company working at the intersection of human and machine learning. Our current project is [Honcho](https://github.com/plastic-labs/honcho), a user context management solution for AI-powered applications. + +We believe that by re-centering LLM app development around the user we can unlock a rich landscape of deeply personalized, autonomous agents. It’s our mission to realize this future. ## Blog [[Honcho; User Context Management for LLM Apps|Honcho: User Context Management for LLM Apps]] -[[blog/Theory-of-Mind Is All You Need]] -[[blog/Open-Sourcing Tutor-GPT]] +[[Theory-of-Mind Is All You Need]] +[[Open-Sourcing Tutor-GPT]] ## Extrusions @@ -26,4 +29,4 @@ It’s our mission to realize this future. ## Research -[Violation of Expectation Reduces Theory-of-Mind Prediction Error in Large Language Models](https://arxiv.org/pdf/2310.06983.pdf) +[Violation of Expectation Reduces Theory-of-Mind Prediction Error in Large Language Models](https://arxiv.org/abs/2310.06983) diff --git a/content/blog/Honcho; User Context Management for LLM Apps.md b/content/blog/Honcho; User Context Management for LLM Apps.md index 5b926c2732440..40dc1e317a636 100644 --- a/content/blog/Honcho; User Context Management for LLM Apps.md +++ b/content/blog/Honcho; User Context Management for LLM Apps.md @@ -11,14 +11,14 @@ Today we drop the first release of a project called [*Honcho*](https://github.co ## Plastic Lore -[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models. +[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models. ^09f185 -As a team with with backgrounds spanning machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and are still 1-to-many instructors. Yes, they herald the beginning of a revolution of personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short. +As a team with with backgrounds in both machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and still are 1-to-many instructors. Yes, they herald the beginning of a revolution in personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short. ![[zombie_tutor_prompt.jpg]] *A well intentioned but monstrously deterministic [tutor prompt](https://www.oneusefulthing.org/p/assigning-ai-seven-ways-of-using).* -Most edtech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing edtech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and translates to a huge missed opportunity. +Most edtech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing edtech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and continues to translate to a huge missed opportunities. So we set out to build a non-skeuomorphic, AI-native tutor that put users first. The same indeterminism so often viewed as LLMs' greatest liability is in fact their greatest strength. Really, it's what they _are_. When great teachers deliver effective personalized instruction, they don't consult some M.Ed flowchart, they leverage the internal personal context they have on the student and reason (consciously or basally) about the best pedagogical intervention. LLMs are the beginning of this kind of high-touch learning companion being _synthetically_ possible. @@ -29,15 +29,15 @@ Our [[Open-Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory-of-Mind Is A ## Context Failure Mode -But we quickly ran up against a hard limitation. The failure mode we believe all vertical specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the full potential here. That's context, specifically user context--Bloom didn't know enough about each student. +But we quickly ran up against a hard limitation. The failure mode we believe all vertical specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the latent potential. That's context, specifically user context--Bloom didn't know enough about each student. -We're always blown away by how many people don't realize that large language models themselves are stateless. They don't remember shit about you. They're just translating the context they're given into a probable sequence of tokens. LLMs are like horoscope writers, good at writing general statements that feel very personal. You would be too if you'd ingested and compressed that much of the written human corpus. +We're consistently blown away by how many people don't realize large language models themselves are stateless. They don't remember shit about you. They're just translating context they're given into probable sequences of tokens. LLMs are like horoscope writers, good at crafting general statements that *feel* very personal. You would be too, if you'd ingested and compressed that much of the written human corpus. ![[geeked_dory.png]] There are lots of developer tricks to give the illusion of state about the user, mostly injecting conversation history or some personal digital artifact into the context window. Another is running inference on that limited recent user context to derive new insights. This was the game changer for our tutor, and we still can't believe by how under-explored that solution space is (more on this soon 👀). -To date, machine learning has been [[The machine learning industry is too focused on general task performance|far more focused on]] optimizing for general task competition than personalization. This is natural, although many of these tasks are still probably better suited to deterministic code. It's also historically prestiged papers over products--it takes a bit for research to morph into tangible utility. Put these together and you end up with a big blindspot over individual users and what they want. +To date, machine learning has been [[The machine learning industry is too focused on general task performance|far more focused on]] optimizing for general task competition than personalization. This is natural, although many of these tasks are still probably better suited to deterministic code. It's also historically prestiged papers over products--research takes bit to morph into tangible utility. Put these together and you end up with a big blindspot over individual users and what they want. The real magic of 1:1 instruction isn't subject matter expertise. Bloom and the foundation models it leveraged had plenty of that (despite what clickbait media would have you believe about hallucination in LLMs). Instead, it's personal context. Good teachers and tutors get to know their charges--their history, beliefs, values, aesthetics, knowledge, preferences, hopes, fears, interests, etc. They compress all that and generate customized instruction, emergent effects of which are the relationships and culture necessary for positive feedback loops. @@ -58,44 +58,43 @@ Prediction algorithms have become phenomenal at hacking attention using tabular Every day human brains do incredibly sophisticated things with sorta-pejoratively labelled 'soft' insights about others. But social cognition is part of the same evolutionarily optimized framework we use to model the rest of the world. -We run continuous active inference on wetware to refine our internal world models. This helps us make better predictions about the world by minimizing the difference between our expectation and reality. That's more or less what learning is. And we use the same set of mechanisms to model other humans, i.e. get to know them. +We run continuous active inference on wetware to refine our internal world models. This helps us make better predictions about our experience by minimizing the difference between our expectation and reality. That's more or less what learning is. And we use the same set of mechanisms to model other humans, i.e. get to know them. + In LLMs we have remarkable predictive reasoning engines with which we can begin to build the foundations of social cognition and therefore model users with much more nuance and granularity. Not just their logged behavior, but reasoning between the lines about its motivation and grounding in the full account of their identity. -Late last year we published a [pre-print of research on this topic](https://arxiv.org/abs/2310.06983), and we've shown that these kinds of biologically-inspired frameworks can construct models of users that improve an LLM's ability to reason and make predictions about that individual user: +Late last year we published a [research pre-print on this topic](https://arxiv.org/abs/2310.06983), and we've shown that these kinds of biologically-inspired frameworks can construct models of users that improve an LLM's ability to reason and make predictions about that individual user: ![[honcho_powered_bloom_paper_fig.png]] -*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared) from our research.* +*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared), from our research.* We added it to Bloom and found the missing piece to overcoming the failure mode of user context. Our tutor could now learn about the student and use that knowledge effectively to produce better learning outcomes. ## Blast Horizon -Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users all growing in unique ways and all needing personalized attention that evolved over multiple longform sessions forced us to confront the user context management problem with all it's thorny intricacy and potential. +Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users, all growing in unique ways, all needing personalized attention that evolved over multiple longform sessions, forced us to confront the user context management problem with all it's thorny intricacy and potential. And we're hearing constantly from builders of other vertical specific AI apps that personalization is the key blocker. In order for projects to graduate form toys to tools, they need to create new kinds of magic for their users. Mountains of mostly static software exists to help accomplish an unfathomable range of tasks and lots of it can be personalized using traditional (albeit laborious for the user) methods. But LLMs can observe, reason, then generate the software _and the user context_, all abstracted away behind the scenes. -Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree. +Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family, without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree. This is the kind of future we can build when we put users at the center of our agent and LLM app production. ## Introducing Honcho -So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. +So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. ^8c982b Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md). ![[honcho basic user context management blog post diagram.png]] -We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably having to take it down due to infrastructure/scaling issues. +We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably taking it down because of infrastructure/scaling issues. -Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Going from prototype to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup. +Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Speedrunning to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup. -Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across agents. +Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across an ecosystem of agents. ## Get Involved -We're excited to see builders experiment with what we're releasing today and with Honcho as it continues to evolve. +We're excited to see builders experiment with what we're releasing today, and with Honcho as it continues to evolve. Check out the [GitHub repo](https://github.com/plastic-labs/honcho) to get started and join our [Discord](https://discord.gg/plasticlabs) to stay up to date 🫡. - -![[tron_bike.gif]] diff --git a/content/blog/Open-Sourcing Tutor-GPT.md b/content/blog/Open-Sourcing Tutor-GPT.md index 3a31522248fe8..ba18c55a87035 100644 --- a/content/blog/Open-Sourcing Tutor-GPT.md +++ b/content/blog/Open-Sourcing Tutor-GPT.md @@ -2,14 +2,13 @@ title: "Open-Sourcing Tutor-GPT" date: "Jun 2, 2023" --- - ![[assets/human_machine_learning.jpeg]] ## TL;DR Today we’re [open-sourcing](https://github.com/plastic-labs/tutor-gpt) Bloom, our digital [Aristotelian](https://erikhoel.substack.com/p/why-we-stopped-making-einsteins) learning companion. -What makes [Bloom](https://bloombot.ai/) compelling is its ability to _reason pedagogically_ about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf) enabling Bloom to construct a nascent academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. +What makes [Bloom](https://bloombot.ai/) compelling is its ability to _reason pedagogically_ about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf,) enabling Bloom to construct a nascent, academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. We’re not seeing this in the explosion of ‘chat-over-content’ tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models don’t achieve the necessary user intimacy. Infrastructure for that doesn’t exist yet 👀. @@ -27,23 +26,23 @@ Current compute suggests we can do high-grade 1:1 for two orders of magnitude ch ![[assets/2 orders magnitude reduced cost.png]] -It's clear generative AI stands a good chance of democratizing this kind of access and attention, but what's less clear are the specifics. It's tough to be an effective teacher that students actually want to learn from. Harder still to let the student guide the experience yet maintain an elevated discourse. +It's clear generative AI stands a good chance of democratizing this kind of access and attention, but what's less clear are the specifics. It's tough to be an effective teacher that students actually want to learn from. Harder still to let the student guide the experience, yet maintain an elevated discourse. -So how do we create successful learning agents that students will eagerly use without coercion? We think this ability lies latent in base models, but the key is eliciting it. +So how do we create successful learning agents that students will eagerly use without coercion? We think this ability lies latent in foundation models, but the key is eliciting it. ## Eliciting Pedagogical Reasoning The machine learning community has long sought to uncover the full range of tasks that large language models can be prompted to accomplish on general pre-training alone (the capability overhang). We believe we have discovered one such task: pedagogical reasoning. -Bloom was built and prompted to elicit this specific type of teaching behavior. (The kind laborious for new teachers, but that adept ones learn to do unconsciously.) After each input it revises a user’s real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. +Bloom was built and prompted to elicit this specific type of teaching behavior. (The kind laborious for new teachers, but that adept ones learn to do unconsciously.) After each input it revises a user’s real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. ^285105 ![[assets/bloombot langchain diagram.png]] -It consists of two “chain” objects from [LangChain](https://python.langchain.com/en/latest/index.html) —a _thought_ and _response_ chain. The _thought_ chain exists to prompt the model to generate a pedagogical thought about the student’s input—e.g. a student’s mental state, learning goals, preferences for the conversation, quality of reasoning, knowledge of the text, etc. The *response*chain takes that _thought_ and generates a response. +It consists of two “chain” objects from [LangChain](https://python.langchain.com/en/latest/index.html) —a _thought_ and _response_ chain. The _thought_ chain exists to prompt the model to generate a pedagogical thought about the student’s input—e.g. a student’s mental state, learning goals, preferences for the conversation, quality of reasoning, knowledge of the text, etc. The *response* chain takes that _thought_ and generates a response. ^1e01f2 -Each chain has a `ConversationSummaryBufferMemory` object summarizing the respective “conversations.” The _thought_ chain summarizes the thoughts into a rank-ordered academic needs list that gains specificity and gets reprioritized with each student input. The _response_ chain summarizes the dialogue in an attempt to avoid circular conversations and record learning progress. +Each chain has a `ConversationSummaryBufferMemory` object summarizing the respective “conversations.” The _thought_ chain summarizes the thoughts into a rank-ordered academic needs list that gains specificity and gets reprioritized with each student input. The _response_ chain summarizes the dialogue in an attempt to avoid circular conversations and record learning progress. ^b1794d -We’re eliciting this behavior from [prompting alone](https://arxiv.org/pdf/2102.07350.pdf). Two of Plastic’s co-founders have extensive experience in education, both in private tutoring and the classroom. They employed this to craft strong example dialogues that sufficiently [demonstrated](https://github.com/plastic-labs/tutor-gpt/tree/main/data) how to respond across a range of situations. +We’re eliciting this behavior from [prompting alone](https://arxiv.org/pdf/2102.07350.pdf). Two of Plastic’s co-founders have extensive experience in education, both in private tutoring and the classroom. They crafted strong example dialogues that sufficiently [demonstrated](https://github.com/plastic-labs/tutor-gpt/tree/main/data) how to respond across a range of situations. Take for example a situation where the student asks directly for an answer. Here is Bloom’s response compared to [Khanmigo’s](https://www.khanacademy.org/khan-labs): @@ -58,19 +57,19 @@ And Bloom is dynamic, even when given no excerpted context and asked about non-t ![[assets/bloom_courtland.png]] -and its accompanying thoughts: +And its accompanying thoughts: ![[assets/bloom_courtland_thoughts.png]] -Notice how Bloom reasons it should indulge the topic, validate the student, and point toward (but not supply) possible answers. Then the resultant responses are ones that do this and more, gently guiding toward a fuller comprehension and higher-fidelity understanding of the music. +Notice how Bloom reasons it should indulge the topic, validate the student, and point toward (but not supply) possible answers. Then the resultant responses do this and more, gently guiding toward a fuller comprehension and higher-fidelity understanding of the music. -Aside from these edgier cases, Bloom shines helping students understand difficult passages (from syntactic to conceptual levels) and giving writing feedback (especially competent at thesis construction). [Take it for a spin.](https://discord.gg/udtxycbh) +Aside from these edgier cases, Bloom shines helping students understand difficult passages (from syntactic to conceptual levels) and giving writing feedback (especially competent at thesis construction). [Take it for a spin](https://discord.gg/udtxycbh). -Ultimately, we hope [open-sourcing Bloom](https://github.com/plastic-labs/tutor-gpt#readme) will allow anyone to run with these elicitations and prompt to expand its utility to support other domains. We’ll be doing work here too. +Ultimately, we hope [open-sourcing Bloom](https://github.com/plastic-labs/tutor-gpt#readme) will allow anyone to run with these elicitations and prompt to expand utility and support multiple domains. We’ll be doing work here too. ## Bloom & Agentic AI -This constitutes the beginning of an approach far superior to just slapping a chatbot UI over a content library that's probably already in a base model's pre-training. +This constitutes the beginning of an approach far superior to just slapping a chatbot UI over a content library that's probably already in the foundation model's pre-training. After all, if it were just about content delivery, MOOCs would've solved education. We need more than that to reliably grow rare minds. And we're already seeing Bloom excel at promoting synthesis and creative interpretation within its narrow utility. diff --git a/content/blog/Theory-of-Mind Is All You Need.md b/content/blog/Theory-of-Mind Is All You Need.md index 1995906929460..697febddf087a 100644 --- a/content/blog/Theory-of-Mind Is All You Need.md +++ b/content/blog/Theory-of-Mind Is All You Need.md @@ -2,10 +2,9 @@ title: "Theory-of-Mind Is All You Need" date: "Jun 12, 2023" --- - ## TL;DR -Today we’re releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [Tutor-GPT](https://github.com/plastic-labs/tutor-gpt)). +Today we’re releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [tutor-gpt](https://github.com/plastic-labs/tutor-gpt)). We gave our tutor even more autonomy to reason about the psychology of the user, and—using GPT-4 to dynamically _rewrite its own_ system prompts—we’re able to dramatically expand the scope of what Bloom can do _and_ massively reduce our prompting architecture. @@ -23,7 +22,7 @@ Explaining all this to a tutor (synthetic or biological) upfront, is laborious a ![[assets/ToM meme.jpeg]] -What a expert educators will do is gather more information throughout the completion of the task, resolving on a more precise objective along the way; keeping the flow natural, and leaving the door open to compelling tangents and pivots. +What expert educators will do is gather more information throughout the completion of the task, resolving on a more precise objective along the way; keeping the flow natural, and leaving the door open to compelling tangents and pivots. The key here is they don’t have all the information—they _don’t know_ what the objective is precisely—but being good at tutoring means turning that into an advantage, figuring it out along the way is _optimal_. The effective human tutor dynamically iterates on a set of internal models about student psychology and session objectives. So how do we recreate this in Bloom? @@ -35,13 +34,11 @@ What if we treated Bloom with some intellectual respect? The solution here is scary simple. The results are scary good. -[Here’s a description](https://plasticlabs.ai/blog/Open-Sourcing-Tutor-GPT/) of the previous version’s architecture: +[[Open-Sourcing Tutor-GPT#^285105|Here’s a description]] of the previous version’s architecture: -> Bloom was built and prompted to elicit \[pedagogical reasoning\]…After each input it revises a user’s real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. -> -> It consists of two “chain” objects from [LangChain](https://python.langchain.com/en/latest/index.html) —a _thought_ and _response_ chain. The _thought_ chain exists to prompt the model to generate a pedagogical thought about the student’s input—e.g. a student’s mental state, learning goals, preferences for the conversation, quality of reasoning, knowledge of the text, etc. The _response_ chain takes that _thought_ and generates a response. -> -> Each chain has a `ConversationSummaryBufferMemory` object summarizing the respective “conversations.” The _thought_ chain summarizes the thoughts into a rank-ordered academic needs list that gains specificity and gets reprioritized with each student input. The _response_ chain summarizes the dialogue in an attempt to avoid circular conversations and record learning progress. +![[Open-Sourcing Tutor-GPT#^285105]] +![[Open-Sourcing Tutor-GPT#^1e01f2]] +![[Open-Sourcing Tutor-GPT#^b1794d]] Instead, we’ve now repurposed the ***thought*** chain to do two things: @@ -76,4 +73,4 @@ All this begs the question: what could Bloom do with even better theory of mind? What could other AI applications do with a framework like this? -Stay tuned. +Stay tuned. \ No newline at end of file diff --git a/content/extrusions/Extrusion 01.24.md b/content/extrusions/Extrusion 01.24.md index e3cb0c87d3923..a777d7d4bb6a8 100644 --- a/content/extrusions/Extrusion 01.24.md +++ b/content/extrusions/Extrusion 01.24.md @@ -6,17 +6,17 @@ No one needs another newsletter, so we'll work to make these worthwhile. Expect ## 2023 Recap -Last year was wild. We started as an edtech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[Honcho; User Context Management for LLM Apps|Honcho: User Context Management for LLM Apps]]:" +Last year was wild. We started as an edtech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:" >[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[Open-Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory-of-Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free... Building a production-grade, user-centric AI application, then giving it nascent [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) and [[Metacognition in LLMs is inference about inference|metacognition]], made it glaringly obvious to us that social cognition in LLMs was both under-explored and under-leveraged. -We pivoted to address this hole in the stack and build the user context management solution agent developers need to truly give their users superpowers. Plastic applied and was accepted to [Betaworks](https://www.betaworks.com/)' [AI Camp: Augment](https://techcrunch.com/2023/08/30/betaworks-goes-all-in-on-augmentative-ai-in-latest-camp-cohort-were-rabidly-interested/?guccounter=1): +We pivoted to address this hole in the stack and build the user context management solution agent developers need to truly give their users superpowers. Plastic applied and was accepted to [Betaworks'](https://www.betaworks.com/) [*AI Camp: Augment*](https://techcrunch.com/2023/08/30/betaworks-goes-all-in-on-augmentative-ai-in-latest-camp-cohort-were-rabidly-interested/?guccounter=1): -We spent camp in a research cycle, then [published pre-print](https://arxiv.org/abs/2310.06983) showing it's possible to enhance LLM theory of mind ability with [predictive coding-inspired](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain) [metaprompting](https://arxiv.org/abs/2102.07350). +We spent camp in a research cycle, then [published a pre-print](https://arxiv.org/abs/2310.06983) showing it's possible to enhance LLM theory of mind ability with [predictive coding-inspired](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain) [metaprompting](https://arxiv.org/abs/2102.07350). @@ -28,7 +28,7 @@ This is the year of Honcho. ![[honcho logo and text.png]] -Last week [[Honcho; User Context Management for LLM Apps|we released]] the... +Last week [[Honcho; User Context Management for LLM Apps#^8c982b|we released]] the... >...first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md). @@ -36,7 +36,7 @@ And coming up, you can expect a lot more: - Next we'll drop a fresh paradigm for constructing agent cognitive architectures with users at the center, replete with cookbooks, integrations, and examples -- After that, we've got some dev viz tooling in the works to allow quick grokking of all the inferences and context at play in a conversation, and visualization and manipulation of entire agent architectures--as well as swap and compare the performance of custom cognition across the landscape of models +- After that, we've got some dev viz tooling in the works to allow quick grokking of all the inferences and context at play in a conversation, visualization and manipulation of entire agent architectures, and swapping and comparing the performance of custom cognition across the landscape of models - Finally, we'll bundle the most useful of all this into an opinionated offering of managed, hosted services @@ -44,4 +44,4 @@ And coming up, you can expect a lot more: Thanks for reading. -You can find is on [X/Twitter](https://twitter.com/plastic_labs), but we'd really like to see you in our [Discord](https://discord.gg/plasticlabs) 🫡. \ No newline at end of file +You can find us on [X/Twitter](https://twitter.com/plastic_labs), but we'd really like to see you in our [Discord](https://discord.gg/plasticlabs) 🫡. \ No newline at end of file diff --git a/content/notes/Honcho name lore.md b/content/notes/Honcho name lore.md index fa19d4c6e708a..a2b1ad4c8eb46 100644 --- a/content/notes/Honcho name lore.md +++ b/content/notes/Honcho name lore.md @@ -6,14 +6,14 @@ The near future Vinge constructs is one of outrageous data abundance, where ever It's such an intense landscape, that the entire educational system has undergone wholesale renovation to address the new normal, and older people must routinely return to school to learn the latest skills. It also complicates economic life, resulting in intricate networks of nested agents than can be hard for any one individual to tease apart. -Highlighting this, a major narrative arc in the novel involves intelligence agencies running operations of pretty unfathomable global sophistication. Since in this world artificial intelligence has more or less failed as a research direction, this requires ultra-competent human operators able to parse and leverage high velocity information. For field operations, it requires a "Local Honcho" on the ground to act as an adaptable central nervous system for the mission and its agents: +Highlighting this, a major narrative arc in the novel involves intelligence agencies running operations of pretty unfathomable global sophistication. Since (in the world of the novel) artificial intelligence has more or less failed as a research direction, this requires ultra-competent human operators able to parse and leverage high velocity information. For field operations, it requires a "Local Honcho" on the ground to act as an adaptable central nervous system for the mission and its agents: >Altogether it was not as secure as Vaz’s milnet, but it would suffice for most regions of the contingency tree. Alfred tweaked the box, and now he was getting Parker’s video direct. At last, he was truly a Local Honcho. -For months, Plastic had been deep into the weeds around harvesting, retrieving, & leveraging user context with LLMs. First to enhance the UX of our AI tutor (Bloom), then in thinking about how to solve this horizontally for all vertical-specific AI applications. It struck us that we faced similar challenges to the characters in _Rainbows End_ and were converging on a similar solution. +For months before, Plastic had been deep into the weeds around harvesting, retrieving, & leveraging user context with LLMs. First to enhance the UX of our AI tutor (Bloom), then in thinking about how to solve this horizontally for all vertical-specific AI applications. It struck us that we faced similar challenges to the characters in _Rainbows End_ and were converging on a similar solution. As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho. -But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your Honcho can orchestrate the relevant context and identities on your behalf, whatever the operation. +But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation. -[^1]: American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_. ([Wiktionary](https://en.wiktionary.org/wiki/honcho)) \ No newline at end of file +[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho)) \ No newline at end of file diff --git a/content/notes/Metacognition in LLMs is inference about inference.md b/content/notes/Metacognition in LLMs is inference about inference.md index 5f1f68962c396..dca00c0c95340 100644 --- a/content/notes/Metacognition in LLMs is inference about inference.md +++ b/content/notes/Metacognition in LLMs is inference about inference.md @@ -2,4 +2,6 @@ For wetware, metacognition is typically defined as ‘thinking about thinking’ (In some more specific domains, it's an introspective process, focused on thinking about exclusively _your own_ thinking or a suite of personal learning strategies...all valid within their purview, but too constrained for our purposes.) -In large language models, the synthetic corollary of cognition is inference. So we can reasonably define a metacognitive process in an LLM as any that runs inference on the output of prior inference. That is, inference itself is used as context--_inference about inference_. It might be instantly injected into the next prompt, stored for later use, or leveraged by another model. Experiments here will be critical to overcome [[The machine learning industry is too focused on general task performance|the machine learning community's fixation on task completion]]. \ No newline at end of file +In large language models, the synthetic corollary of cognition is inference. So we can reasonably define a metacognitive process in an LLM architecture as any that runs inference on the output of prior inference. That is, inference itself is used as context--_inference about inference_. + +It might be instantly injected into the next prompt, stored for later use, or leveraged by another model. This kind of architecture is critical when dealing with user context, since LLMs can run inference about user behavior, then use that synthetic context in the future. Experiments here will be critical to overcome [[The machine learning industry is too focused on general task performance|the machine learning community's fixation on task completion]]. \ No newline at end of file diff --git a/content/notes/The machine learning industry is too focused on general task performance.md b/content/notes/The machine learning industry is too focused on general task performance.md index 8965fa30889be..8c6f2cf01ded3 100644 --- a/content/notes/The machine learning industry is too focused on general task performance.md +++ b/content/notes/The machine learning industry is too focused on general task performance.md @@ -1,7 +1,7 @@ -The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to emergent abilities, debates about the true nature of which rage on. +The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[Theory-of-Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on. However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis. -Take summarization. It’s a popular machine learning task at which models have become quite proficient, at least from a benchmark perspective. However, when models summarize for users with a pulse, they fall short. The reason is simple: the models don’t know this individual. The key takeaways for a specific user differ dramatically from the takeaways _any possible_ internet user _would probably_ note. +Take summarization. It’s a popular machine learning task at which models have become quite proficient...at least from a benchmark perspective. However, when models summarize for users with a pulse, they fall short. The reason is simple: the models don’t know this individual. The key takeaways for a specific user differ dramatically from the takeaways _any possible_ internet user _would probably_ note. So a shift in focus toward user-specific task performance would provide a much more dynamic & realistic approach. Catering to individual needs & paving the way for more personalized & effective ML applications.