GitHub - louis030195/screen-pipe: Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust.

Civilization progresses by the number of operations it can perform without conscious effort.
— Whitehead

Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust + WASM.

screenpipe is a library that allows you to gather all your life context and connect it to LLMs easily for:

search (e.g. go beyond your limited human memory)
automation (such as making actions on the web while you work)
etc.

Example vercel/ai-chatbot that query screenpipe autonomously

Check this example of screenpipe

070424.mp4

Check documentation to try it

Status

Alpha: runs on my computer (Macbook pro m3 32 GB ram).

Usage

Keep in mind that it's still experimental.

To try the current version, which capture your screen and extract the text, do:

Install dependencies:

# On Mac
brew install ffmpeg

# On Linux
sudo update
sudo apt install -y ffmpeg libasound2-dev libavcodec-dev libavformat-dev libavutil-dev

Clone the repo:

git clone https://github.com/louis030195/screen-pipe
cd screen-pipe

Run the API (make sure to install Rust):

# This runs a local SQLite DB + an API + screenshot, ocr, mic, stt, mp4 encoding
cargo build --release
./target/release/pipe

# if you are on mac, use "cargo build --release --features metal" to leverage M series acceleration

# or only stream audio + speech to text to stdout
./target/release/pipe-audio

# or only stream screenshots + ocr to stdout
./target/release/pipe-vision

# or only record mp4 videos + json containing ocr
./target/release/pipe-video

PS: in dev mode it's like 1000x slower (cargo run --bin screenpipe-server)

Examples to query the API

# 1. Basic search query
curl "http://localhost:3030/search?q=test&limit=5&offset=0"

# 2. Search with content type filter (OCR)
curl "http://localhost:3030/search?q=test&limit=5&offset=0&content_type=ocr"

# 3. Search with content type filter (Audio)
curl "http://localhost:3030/search?q=test&limit=5&offset=0&content_type=audio"

# 4. Search with pagination
curl "http://localhost:3030/search?q=test&limit=10&offset=20"

# 5. Get recent results without date range
curl "http://localhost:3030/recent?limit=5&offset=0"

# 6. Get recent results with date range
curl "http://localhost:3030/recent?limit=5&offset=0&start_date=2024-07-02T14:00:00&end_date=2024-07-02T23:59:59"

# 5 s ago
start_date=$(date -u -v-5S +'%Y-%m-%dT%H:%M:%S')
end_date=$(date -u +'%Y-%m-%dT%H:%M:%S')
curl "http://localhost:3030/recent?limit=5&offset=0&start_date=$start_date&end_date=$end_date"

# 6. Search with no query (should return all results)
curl "http://localhost:3030/search?limit=5&offset=0"

# 7. Get recent results with pagination
curl "http://localhost:3030/recent?limit=20&offset=40"

Now pipe this into a LLM to build:

memory extension apps
automatic summaries
automatic action triggers (say every time you see a dog, send a tweet)
automatic CRM (fill salesforce while you spam ppl on linkedin)

We are working toward making it easier to try, feel free to help!

Why open source?

Recent breakthroughs in AI have shown that context is the final frontier. AI will soon be able to incorporate the context of an entire human life into its 'prompt', and the technologies that enable this kind of personalisation should be available to all developers to accelerate access to the next stage of our evolution.

Principles

This is a library intended to stick to simple use case:

record the screen & associated metadata (generated locally or in the cloud) and pipe it somewhere (local, cloud)

Think of this as an API that let's you do this:

screenpipe | ocr | llm "turn what i see into my CRM" | api "send data to salesforce api"

Any interfaces are out of scope and should be built outside this repo, for example:

UI to search on these files (like rewind)
UI to spy on your employees
etc.

Contributing

Contributions are welcome! If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Say 👋 in our public Discord channel . We discuss how to bring this lib to production, help each other with contributions, personal projects or just hang out ☕.

Bit more details on the architecture here.

Licensing

The code in this project is licensed under MIT license. See the LICENSE file for more information.

Related projects

This is a very quick & dirty example of the end goal that works in a few lines of python: https://github.com/louis030195/screen-to-crm

Very thankful for https://github.com/jasonjmcghee/xrem which was helpful. Although screenpipe is going in a different direction.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
examples/ts		examples/ts
screenpipe-audio		screenpipe-audio
screenpipe-server		screenpipe-server
screenpipe-vision		screenpipe-vision
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md
build.rs		build.rs
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example vercel/ai-chatbot that query screenpipe autonomously

Status

Usage

Why open source?

Principles

Contributing

Licensing

Related projects

About

Languages

License

louis030195/screen-pipe

Folders and files

Latest commit

History

Repository files navigation

Example vercel/ai-chatbot that query screenpipe autonomously

Status

Usage

Why open source?

Principles

Contributing

Licensing

Related projects

About

Topics

Resources

License

Stars

Watchers

Forks

Languages