[bounty] [pipe] [developer program] search #1188

m13v · 2025-01-22T05:02:41Z

[please read the developer program description https://github.com//issues/1184]

improve keyword-based search functionality.
add semantic search capabilities for real-time and retrospective data

suggest other ideas, send us your plan, write comments, give feedback

m13v · 2025-01-22T05:02:52Z

/bounty 1000

algora-pbc · 2025-01-22T05:02:57Z

💎 $1,000 bounty • Screenpi.pe

Steps to solve:

Start working: Comment /attempt #1188 with your implementation plan
Submit work: Create a pull request including /claim #1188 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bounty • Share on socials

Attempt	Started (GMT+0)	Solution
🟢 @kumarvivek1752	Jan 22, 2025, 2:24:45 PM	WIP
🟢 @b4s36t4	Jan 22, 2025, 8:50:23 PM	WIP

kumarvivek1752 · 2025-01-22T14:24:42Z

/attempt #1188

Algora profile	Completed bounties	Tech	Active attempts	Options
@kumarvivek1752	1 mediar-ai bounty	HTML, TypeScript, CSS & more		Cancel attempt

kumarvivek1752 · 2025-01-22T17:52:44Z

What kind of embedding and vector databases do you want to implement for semantic searching? in my opinion open-ai embeddings and pinecone or chroma would be great.

m13v · 2025-01-22T19:38:38Z

please first give us your plan, please read #1184

m13v · 2025-01-22T19:53:21Z

for semantic search we use nomic-embed-text through ollama atm, please consider implementing it in rust if u can, as with the database we need to minimize extra dependencies for users, and complication for the project overall. I'd go with something like SQLite-vss and build an incremental indexing on top of it, but it's up for discussion.

m13v · 2025-01-22T19:53:57Z

you can propose other improvements, features in the search, have you used the pipe?

b4s36t4 · 2025-01-22T20:50:19Z

Would like to give this a try.
/attempt #1188

Process:

Semantic Search:
- Tokenizer - Embedding require a custom tokenizer, can be fetched from hf_hub and should be loaded to tokenizer lib.
- Inference - Use ort runtime to load the onnx model and generate embeddings.
Embeddings are generated for each token, we need to do mean pooling to generate a combined embeddings for whole text.
(uses ndarray to generate tokens or mean pooling etc, can be taken inspiration from transformers.js.)
- sqlite_vec extension to support vector db support for sqlite, (Personally tested and it's excellent.)
Need to verify weather it works with sqlx or not.
- Generate a new endpoint (maybe called /semantic) to keep the current search not breaking or to support more features with vector DB.
- /semantic endpoint will simply embed the query passed to request and searches in DB.

We already have sqlite_vec extension loaded for the sqlx which we can simply user.

Sample implementation of embedding on rust side can be found here

Algora profile	Completed bounties	Tech	Active attempts	Options
@b4s36t4	1 mediar-ai bounty + 3 bounties from 2 projects	TypeScript, Rust, JavaScript & more	﹟1106, ﹟1190	Cancel attempt

cc: @m13v for the review.

kumarvivek1752 · 2025-01-23T06:08:24Z

@m13v i have similar plan, i used search pipe quite often i noticed some errors like offset is not working properly pipe freeze after increasing page size, and thinking about some ui changes like adding more filters if nessesry.

as you mentioned we want very little dependency so, i'll write in native rust.
and use:

tokenizer - hf_hub
embeddings - nomic-embed-text
vector database -sqlite-vss

louis030195 · 2025-01-23T20:17:31Z

guys:

suggest mockups of the end user experience
think about the technical details later (PS: can you read the code before suggesting things? we already have what you suggest mostly)

m13v · 2025-01-24T19:11:31Z

guys @b4s36t4 @kumarvivek1752 , great suggestions overall, let's discuss a bit more, btw we have the end point for semantic search, but here is the current situation and projects (think of each project as a separate bounty):

Project 1: live indexing of the data - we don't do any indexing atm, even though sqlite extensions support indexing, they don't support incremental indexing, which is crucial since we add new embeddings every few seconds, and users primarily interested in searching their latest data. so first we need to implement embedding in rust, then we need to create incremental indexing

Project 2: embed and index the data you already have, it's mostly a UX thing, cause we don't want to freeze user computer for unknown period, and what if the user quite screenpipe in the middle of the process, and also we don't know how much data they already have, etc. i think this project is more for the future

Project 3: fix current issues, like you mentioned @kumarvivek1752: page size issues, filter etc. happy to hear a more detailed plan here, but again this one will require you to test it as a user back and forth on some use case to make sure it works smoothly

let me know what you want to work on or if you have other comments

b4s36t4 · 2025-01-24T19:34:00Z

Hi, @m13v. Thanks for the feedback, as I have stated in my proposal I'm more leaning towards implementing the embedding in rust which I already implemented in one of my personal project, which did very pretty well job compared to running embedding model on wasm runtime with DOM.

Please take a look at the implementation I have done, I have attached in the proposal comment. Let me know your thoughts on base implemenation.

Regarding the incremental indexing, I'm not sure I understand it currently, if you shed some. more details on the exact details would be a lot helpful.

m13v · 2025-01-25T02:39:47Z

@b4s36t4 yes we need to implement the embedding in rust, the example implementation looks good, but we need to use nomic-embed-text. sqlite vector embedding do support indexing, but if the embedding table is updated, we the indexing would redo the entire table, while we should have incremental indexing, ask claude: "does sqlite_vec extension support incremental indexing". and then ask "can you explain challenges of incremental indexing?" and then ask "how to implement incremental indexing for embeddings in sqlite"

b4s36t4 · 2025-01-30T23:23:32Z

Hi, @m13v, Sorry for reverting such late. Yea, I think I'm good. I want to tackle the embedding and incremental update.

I'm planning to use a scheduler job to process to process a 100 records at a time to index when the CPU usage is low (will need to monitor the usage for the indexing part so that we don't pressure cpu/gpu with the processing).

Let me know what you think?, if it's ok I can get started.

m13v · 2025-02-01T23:08:35Z

That’s great! A follow up question, how can we test usefulness of the semantic search? Would u be able to test it with a user?

…

On Thu, Jan 30, 2025 at 3:23 PM Mahesh Vagicherla ***@***.***> wrote: Hi, @m13v <https://github.com/m13v>, Sorry for reverting such late. Yea, I think I'm good. I want to tackle the embedding and incremental update. I'm planning to use a scheduler job to process to process a 100 records at a time to index when the CPU usage is low (will need to monitor the usage for the indexing part so that we don't pressure cpu/gpu with the processing). Let me know what you think?, if it's ok I can get started. — Reply to this email directly, view it on GitHub <#1188 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AY62CDBF255ITGDZ7E7MK2T2NKYAVAVCNFSM6AAAAABVUBOLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRVHEYDMOJUG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

b4s36t4 · 2025-02-02T03:00:58Z

I can run the screenpipe for sometime in my system to populate some data into my sqlite, right? If that doesn't work maybe I can ask someone from community or so.

m13v added the enhancement New feature or request label Jan 22, 2025

algora-pbc bot added the 💎 Bounty label Jan 22, 2025

m13v mentioned this issue Jan 26, 2025

[bounty] [pipe] $1000 - developer program #1184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bounty] [pipe] [developer program] search #1188

[bounty] [pipe] [developer program] search #1188

m13v commented Jan 22, 2025 •

edited

Loading

m13v commented Jan 22, 2025

algora-pbc bot commented Jan 22, 2025 •

edited

Loading

kumarvivek1752 commented Jan 22, 2025 •

edited by algora-pbc bot

Loading

kumarvivek1752 commented Jan 22, 2025

m13v commented Jan 22, 2025

m13v commented Jan 22, 2025

m13v commented Jan 22, 2025

b4s36t4 commented Jan 22, 2025 •

edited

Loading

kumarvivek1752 commented Jan 23, 2025

louis030195 commented Jan 23, 2025

m13v commented Jan 24, 2025

b4s36t4 commented Jan 24, 2025

m13v commented Jan 25, 2025

b4s36t4 commented Jan 30, 2025

m13v commented Feb 1, 2025 via email

b4s36t4 commented Feb 2, 2025

[bounty] [pipe] [developer program] search #1188

[bounty] [pipe] [developer program] search #1188

Comments

m13v commented Jan 22, 2025 • edited Loading

m13v commented Jan 22, 2025

algora-pbc bot commented Jan 22, 2025 • edited Loading

💎 $1,000 bounty • Screenpi.pe

Steps to solve:

kumarvivek1752 commented Jan 22, 2025 • edited by algora-pbc bot Loading

kumarvivek1752 commented Jan 22, 2025

m13v commented Jan 22, 2025

m13v commented Jan 22, 2025

m13v commented Jan 22, 2025

b4s36t4 commented Jan 22, 2025 • edited Loading

kumarvivek1752 commented Jan 23, 2025

louis030195 commented Jan 23, 2025

m13v commented Jan 24, 2025

b4s36t4 commented Jan 24, 2025

m13v commented Jan 25, 2025

b4s36t4 commented Jan 30, 2025

m13v commented Feb 1, 2025 via email

b4s36t4 commented Feb 2, 2025

m13v commented Jan 22, 2025 •

edited

Loading

algora-pbc bot commented Jan 22, 2025 •

edited

Loading

kumarvivek1752 commented Jan 22, 2025 •

edited by algora-pbc bot

Loading

b4s36t4 commented Jan 22, 2025 •

edited

Loading