Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whole blog proofread #44

Merged
merged 1 commit into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,21 @@
title: Home
enableToc: false
---
Welcome to our collaborative second brain.

Welcome to our collaborative second brain. Here you'll find our blog posts, "Extrusions" newsletter, evergreen notes, and public research. And if you like, [you can engage with the ideas directly](https://github.com/plastic-labs/blog) on GitHub.
Here you'll find our blog posts, "Extrusions" newsletter, evergreen notes, and public research. And if you like, [you can engage with the ideas directly](https://github.com/plastic-labs/blog) on GitHub.

Plastic Labs is a research-driven company working at the intersection of human and machine learning. Our current project is [Honcho](https://github.com/plastic-labs/honcho), a user context management solution for AI-powered applications. We believe that by re-centering LLM app development around the user we can unlock a rich landscape of deeply personalized, autonomous agents.
Plastic Labs is a research-driven company working at the intersection of human and machine learning. Our current project is [Honcho](https://github.com/plastic-labs/honcho), a user context management solution for AI-powered applications.

We believe that by re-centering LLM app development around the user we can unlock a rich landscape of deeply personalized, autonomous agents.

It’s our mission to realize this future.

## Blog

[[Honcho; User Context Management for LLM Apps|Honcho: User Context Management for LLM Apps]]
[[blog/Theory-of-Mind Is All You Need]]
[[blog/Open-Sourcing Tutor-GPT]]
[[Theory-of-Mind Is All You Need]]
[[Open-Sourcing Tutor-GPT]]

## Extrusions

Expand All @@ -26,4 +29,4 @@ It’s our mission to realize this future.

## Research

[Violation of Expectation Reduces Theory-of-Mind Prediction Error in Large Language Models](https://arxiv.org/pdf/2310.06983.pdf)
[Violation of Expectation Reduces Theory-of-Mind Prediction Error in Large Language Models](https://arxiv.org/abs/2310.06983)
35 changes: 17 additions & 18 deletions content/blog/Honcho; User Context Management for LLM Apps.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ Today we drop the first release of a project called [*Honcho*](https://github.co

## Plastic Lore

[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models.
[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models. ^09f185

As a team with with backgrounds spanning machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and are still 1-to-many instructors. Yes, they herald the beginning of a revolution of personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short.
As a team with with backgrounds in both machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and still are 1-to-many instructors. Yes, they herald the beginning of a revolution in personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short.

![[zombie_tutor_prompt.jpg]]
*A well intentioned but monstrously deterministic [tutor prompt](https://www.oneusefulthing.org/p/assigning-ai-seven-ways-of-using).*

Most edtech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing edtech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and translates to a huge missed opportunity.
Most edtech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing edtech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and continues to translate to a huge missed opportunities.

So we set out to build a non-skeuomorphic, AI-native tutor that put users first. The same indeterminism so often viewed as LLMs' greatest liability is in fact their greatest strength. Really, it's what they _are_. When great teachers deliver effective personalized instruction, they don't consult some M.Ed flowchart, they leverage the internal personal context they have on the student and reason (consciously or basally) about the best pedagogical intervention. LLMs are the beginning of this kind of high-touch learning companion being _synthetically_ possible.

Expand All @@ -29,15 +29,15 @@ Our [[Open-Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory-of-Mind Is A

## Context Failure Mode

But we quickly ran up against a hard limitation. The failure mode we believe all vertical specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the full potential here. That's context, specifically user context--Bloom didn't know enough about each student.
But we quickly ran up against a hard limitation. The failure mode we believe all vertical specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the latent potential. That's context, specifically user context--Bloom didn't know enough about each student.

We're always blown away by how many people don't realize that large language models themselves are stateless. They don't remember shit about you. They're just translating the context they're given into a probable sequence of tokens. LLMs are like horoscope writers, good at writing general statements that feel very personal. You would be too if you'd ingested and compressed that much of the written human corpus.
We're consistently blown away by how many people don't realize large language models themselves are stateless. They don't remember shit about you. They're just translating context they're given into probable sequences of tokens. LLMs are like horoscope writers, good at crafting general statements that *feel* very personal. You would be too, if you'd ingested and compressed that much of the written human corpus.

![[geeked_dory.png]]

There are lots of developer tricks to give the illusion of state about the user, mostly injecting conversation history or some personal digital artifact into the context window. Another is running inference on that limited recent user context to derive new insights. This was the game changer for our tutor, and we still can't believe by how under-explored that solution space is (more on this soon 👀).

To date, machine learning has been [[The machine learning industry is too focused on general task performance|far more focused on]] optimizing for general task competition than personalization. This is natural, although many of these tasks are still probably better suited to deterministic code. It's also historically prestiged papers over products--it takes a bit for research to morph into tangible utility. Put these together and you end up with a big blindspot over individual users and what they want.
To date, machine learning has been [[The machine learning industry is too focused on general task performance|far more focused on]] optimizing for general task competition than personalization. This is natural, although many of these tasks are still probably better suited to deterministic code. It's also historically prestiged papers over products--research takes bit to morph into tangible utility. Put these together and you end up with a big blindspot over individual users and what they want.

The real magic of 1:1 instruction isn't subject matter expertise. Bloom and the foundation models it leveraged had plenty of that (despite what clickbait media would have you believe about hallucination in LLMs). Instead, it's personal context. Good teachers and tutors get to know their charges--their history, beliefs, values, aesthetics, knowledge, preferences, hopes, fears, interests, etc. They compress all that and generate customized instruction, emergent effects of which are the relationships and culture necessary for positive feedback loops.

Expand All @@ -58,44 +58,43 @@ Prediction algorithms have become phenomenal at hacking attention using tabular

Every day human brains do incredibly sophisticated things with sorta-pejoratively labelled 'soft' insights about others. But social cognition is part of the same evolutionarily optimized framework we use to model the rest of the world.

We run continuous active inference on wetware to refine our internal world models. This helps us make better predictions about the world by minimizing the difference between our expectation and reality. That's more or less what learning is. And we use the same set of mechanisms to model other humans, i.e. get to know them.
We run continuous active inference on wetware to refine our internal world models. This helps us make better predictions about our experience by minimizing the difference between our expectation and reality. That's more or less what learning is. And we use the same set of mechanisms to model other humans, i.e. get to know them.

In LLMs we have remarkable predictive reasoning engines with which we can begin to build the foundations of social cognition and therefore model users with much more nuance and granularity. Not just their logged behavior, but reasoning between the lines about its motivation and grounding in the full account of their identity.

Late last year we published a [pre-print of research on this topic](https://arxiv.org/abs/2310.06983), and we've shown that these kinds of biologically-inspired frameworks can construct models of users that improve an LLM's ability to reason and make predictions about that individual user:
Late last year we published a [research pre-print on this topic](https://arxiv.org/abs/2310.06983), and we've shown that these kinds of biologically-inspired frameworks can construct models of users that improve an LLM's ability to reason and make predictions about that individual user:

![[honcho_powered_bloom_paper_fig.png]]
*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared) from our research.*
*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared), from our research.*

We added it to Bloom and found the missing piece to overcoming the failure mode of user context. Our tutor could now learn about the student and use that knowledge effectively to produce better learning outcomes.

## Blast Horizon

Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users all growing in unique ways and all needing personalized attention that evolved over multiple longform sessions forced us to confront the user context management problem with all it's thorny intricacy and potential.
Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users, all growing in unique ways, all needing personalized attention that evolved over multiple longform sessions, forced us to confront the user context management problem with all it's thorny intricacy and potential.

And we're hearing constantly from builders of other vertical specific AI apps that personalization is the key blocker. In order for projects to graduate form toys to tools, they need to create new kinds of magic for their users. Mountains of mostly static software exists to help accomplish an unfathomable range of tasks and lots of it can be personalized using traditional (albeit laborious for the user) methods. But LLMs can observe, reason, then generate the software _and the user context_, all abstracted away behind the scenes.

Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree.
Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family, without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree.

This is the kind of future we can build when we put users at the center of our agent and LLM app production.

## Introducing Honcho

So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API.
So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. ^8c982b

Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md).

![[honcho basic user context management blog post diagram.png]]

We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably having to take it down due to infrastructure/scaling issues.
We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably taking it down because of infrastructure/scaling issues.

Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Going from prototype to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup.
Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Speedrunning to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup.

Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across agents.
Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across an ecosystem of agents.

## Get Involved

We're excited to see builders experiment with what we're releasing today and with Honcho as it continues to evolve.
We're excited to see builders experiment with what we're releasing today, and with Honcho as it continues to evolve.

Check out the [GitHub repo](https://github.com/plastic-labs/honcho) to get started and join our [Discord](https://discord.gg/plasticlabs) to stay up to date 🫡.

![[tron_bike.gif]]
Loading