You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing some research on how best to put a Search system in place in a system that uses many storage mechanisms. Typically you have Structured ( sql ) data, and unstructured ( document ) data.
Then you have graph DB's too which try to do and other things all in one.
I am looking at Sema and Zinc.
They both seem to be designed for Document indexing and search
Next one is scale out. You include a Cluster of sorts it seems.
A lot of teams end up with a Event Stream of mutations that go into a Message queue or otherwise and then have "Command agents" that pick up the mutation and do a transform into each Store to keep them up "up to date".
I use NATS Jetsam for example...
So I was wondering if you could comment how that relates to your Cluster, because with a Message queue, you can "ensure" that many Store instances get updates eventually.
One other point is Change streams. Does Sema tell me via HTTP or SSE or other that a record changes and its nature. Sometimes called CDC.
This can be vital when you want other things to happen when things happen inside Sema.
Hope dont mind me raising an issue like this, but I as I said its important when picking stores, how they are designed etc. I tend to go for goalng and not rust ones since I code golang..
thanks in advance.
The text was updated successfully, but these errors were encountered:
Hello again and it's good to see you haven't given up on NATS and using message queues to scale out since #10. To comment on the discussion points:
SemaDB uses bleve for the text analysis only, not for indexing and doesn't depend on its search capabilities. The analysis involves splitting the text into terms and then we implement TF-IDF ourselves. This is done to keep SemaDB self-contained as possible although bleeve search is a great project.
The horizontal scaling using NATS, we discussed in #10 and at this stage we're not thinking of incorporating a separate moving component like NATS Jetstream. The main limitation on SemaDB on horizontal scaling is on the write path and the shard where the data is allocated must be online for the write operation to succeed.
At the moment SemaDB doesn't have a change stream mechanism because full replication isn't implemented. Each collection is assigned a primary server and all write requests are linearised through that server.
I'll leave this issue open again in case others want to comment.
nuric
changed the title
comparision
Scaling with message queues
Mar 4, 2025
I use nats for a ton of projects so will see where the ground lies with Sema.
My use case is documents and videos / images that need to be everywhere and edited everywhere so have been using CRDT for multi version merging in general.
The schema of the document can change . And so been using WASM . Main host sniffs the version and loads the WASM that matches the doc version.
It’s for a myriad of uses . But think Google Workplsce ( or whatever they call it these days ) or Apple cloud for the masses….
I figure it’s worth explaining the use case so we can compare notes as they say .
Hey,
Glad to see Semadb is still alive and kicking !
This is very much a due diligence / discussion...
I am doing some research on how best to put a Search system in place in a system that uses many storage mechanisms. Typically you have Structured ( sql ) data, and unstructured ( document ) data.
Then you have graph DB's too which try to do and other things all in one.
I am looking at Sema and Zinc.
They both seem to be designed for Document indexing and search
First difference is the bleve deps...
https://github.com/zincsearch/zincsearch uses github.com/blugelabs/bluge
https://github.com/Semafind/semadb uses github.com/blevesearch/bleve/v2
Next one is scale out. You include a Cluster of sorts it seems.
A lot of teams end up with a Event Stream of mutations that go into a Message queue or otherwise and then have "Command agents" that pick up the mutation and do a transform into each Store to keep them up "up to date".
I use NATS Jetsam for example...
So I was wondering if you could comment how that relates to your Cluster, because with a Message queue, you can "ensure" that many Store instances get updates eventually.
One other point is Change streams. Does Sema tell me via HTTP or SSE or other that a record changes and its nature. Sometimes called CDC.
This can be vital when you want other things to happen when things happen inside Sema.
Hope dont mind me raising an issue like this, but I as I said its important when picking stores, how they are designed etc. I tend to go for goalng and not rust ones since I code golang..
thanks in advance.
The text was updated successfully, but these errors were encountered: