Scaling with message queues #16

gedw99 · 2025-03-03T21:22:48Z

Hey,

Glad to see Semadb is still alive and kicking !

This is very much a due diligence / discussion...

I am doing some research on how best to put a Search system in place in a system that uses many storage mechanisms. Typically you have Structured ( sql ) data, and unstructured ( document ) data.

Then you have graph DB's too which try to do and other things all in one.

I am looking at Sema and Zinc.
They both seem to be designed for Document indexing and search

First difference is the bleve deps...

https://github.com/zincsearch/zincsearch uses github.com/blugelabs/bluge

https://github.com/Semafind/semadb uses github.com/blevesearch/bleve/v2

Next one is scale out. You include a Cluster of sorts it seems.

A lot of teams end up with a Event Stream of mutations that go into a Message queue or otherwise and then have "Command agents" that pick up the mutation and do a transform into each Store to keep them up "up to date".

I use NATS Jetsam for example...

So I was wondering if you could comment how that relates to your Cluster, because with a Message queue, you can "ensure" that many Store instances get updates eventually.

One other point is Change streams. Does Sema tell me via HTTP or SSE or other that a record changes and its nature. Sometimes called CDC.

This can be vital when you want other things to happen when things happen inside Sema.

Hope dont mind me raising an issue like this, but I as I said its important when picking stores, how they are designed etc. I tend to go for goalng and not rust ones since I code golang..

thanks in advance.

nuric · 2025-03-04T10:35:22Z

Hello again and it's good to see you haven't given up on NATS and using message queues to scale out since #10. To comment on the discussion points:

SemaDB uses bleve for the text analysis only, not for indexing and doesn't depend on its search capabilities. The analysis involves splitting the text into terms and then we implement TF-IDF ourselves. This is done to keep SemaDB self-contained as possible although bleeve search is a great project.

The horizontal scaling using NATS, we discussed in #10 and at this stage we're not thinking of incorporating a separate moving component like NATS Jetstream. The main limitation on SemaDB on horizontal scaling is on the write path and the shard where the data is allocated must be online for the write operation to succeed.

At the moment SemaDB doesn't have a change stream mechanism because full replication isn't implemented. Each collection is assigned a primary server and all write requests are linearised through that server.

I'll leave this issue open again in case others want to comment.

gedw99 · 2025-03-05T01:21:03Z

Wonderfully concise answers. Thanks @nuric

I’m going to kick the tires on this.

I use nats for a ton of projects so will see where the ground lies with Sema.

My use case is documents and videos / images that need to be everywhere and edited everywhere so have been using CRDT for multi version merging in general.

The schema of the document can change . And so been using WASM . Main host sniffs the version and loads the WASM that matches the doc version.

It’s for a myriad of uses . But think Google Workplsce ( or whatever they call it these days ) or Apple cloud for the masses….

I figure it’s worth explaining the use case so we can compare notes as they say .

nuric added question Further information is requested clusternode Relating to cluster nodes labels Mar 4, 2025

nuric changed the title ~~comparision~~ Scaling with message queues Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling with message queues #16

Scaling with message queues #16

gedw99 commented Mar 3, 2025 •

edited

Loading

nuric commented Mar 4, 2025

gedw99 commented Mar 5, 2025

Scaling with message queues #16

Scaling with message queues #16

Comments

gedw99 commented Mar 3, 2025 • edited Loading

nuric commented Mar 4, 2025

gedw99 commented Mar 5, 2025

gedw99 commented Mar 3, 2025 •

edited

Loading