Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared vision? #1

Open
rufuspollock opened this issue Mar 8, 2024 · 9 comments
Open

Shared vision? #1

rufuspollock opened this issue Mar 8, 2024 · 9 comments

Comments

@rufuspollock
Copy link

Hi, cool project 👍. Got pointed here and looked at the vision and thought there may be some connections and synergies with a project we've been working on:

https://markdowndb.com

https://github.com/datopian/markdowndb

Precisely designed as a "content layer"

A rich API to your markdown files in seconds.
An open JS library to turn markdown files into structured, queryable data (SQL and JSON). Build rich markdown-powered sites fast and reliably.

Rich metadata extracted including frontmatter, links and more.

Lightweight and fast indexing 1000s of files in seconds.

Open source and extensible via plugin system.

@stereobooster
Copy link
Owner

stereobooster commented Mar 8, 2024

Hey 👋 . Those are definitely two very similar projects. First name for this project was mdb (short from markdowndb). I renamed it because I found that braindb was free on npm.

From technical point of view those are very similar as well. We both scan directory, parse markdown and store it in sqlite. The difference are:

  • in my case I choose to watch directory and constantly re-parse files as soon as they change instead of doing one time scan (if I understood your tool correctly)
  • also I decide to not expose internal DB and create abstract layer on top, so that I would be able to change internals without affecting end-users. For example, I can change how I store fields - in separate columns or in one JSON field. Or switch from relational DB (sqlite) to graph db (kuzu), etc.

But if we disregard technical differences, I think conceptually we're on the same page

@rufuspollock
Copy link
Author

👍

in my case I choose to watch directory and constantly re-parse files as soon as they change instead of doing one time scan (if I understood your tool correctly)

We support that yes - see datopian/markdowndb#45

also I decide to not expose internal DB and create abstract layer on top, so that I would be able to change internals without affecting end-users. For example, I can change how I store fields - in separate columns or in one JSON field. Or switch from relational DB (sqlite) to graph db (kuzu), etc.

That's a good point. We have two separate parts of the code: a part that generates an internal (typescript) structure and then exporters that write that e.g. one to simple json, one to SQL(ite).

@stereobooster
Copy link
Owner

I found one more similar project https://github.com/MicroWebStacks/content-structure cc @wassfila

@wassfila
Copy link

wassfila commented Apr 7, 2024

very cool, I like this project, I'll be having a closer look.
Content-structure is inspired from astro's content collection but I needed to expand to very generic content hierarchy, basicaly no constraints on orgamization and any markdown website should work. Structure unlike content collection, also parses the internal markdown structure so extracts images (even text in svg), tables, code blocks,also links... and create references to them with their sections. Use cases are mainly custom cms renderer I'm my own customer for my e.g. home automation website, https://github.com/HomeSmartMesh/website but also data injection in search engines and even embeddings generation for RAG see this use case https://github.com/VectorWisdom/search-llm-server
I'm not focusing on types for example and also not much on the db fetch aspect like sql or other, so I have just json files with light wrappers, but that's where probably this project is better.

@stereobooster
Copy link
Owner

@stereobooster
Copy link
Owner

And this one as well https://github.com/peterbe/docsql cc @peterbe. (I'm sorry for spamming in separate messages)

@stereobooster
Copy link
Owner

I figured out how to implement Obsidian Dataview in Astro (generally in any SSG that uses remark). See https://astro-digital-garden.stereobooster.com/recipes/obsidian-dataview/

@wassfila
Copy link

wassfila commented Jun 23, 2024

I went more into https://github.com/stereobooster/braindb/blob/main/packages/docs/src/content/docs/notes/vision.md and we're definitely having a shared vision.
This singularity of this vision is too good to be true, I just need the right balance between reuse and custom stitching, custom dev.
The examples and repos I linked above are not very explicit, for info, this is the "core parser" called "Content Structure" https://github.com/MicroWebStacks/content-structure , it is a separate npm package on purpose, free from any framework specific logic. To prove it, I wrote here an sql exporter, a real pure vanilla SQL https://github.com/MicroWebStacks/markdown-rag-services/blob/main/db/sql-lite-utils.py , I won't fall in the trap of trying to offer "a more fancy API" that ruins openness to any use case, anything can plug on top of sql.
"Anything" is a matter of speach, I plan a neo4j db injector as well, for the purpose of making an llm take advantage of such "strong relationship" form through db agents, I'm against letting llms infer graphs from unstructured data. Anyway, long story short, I wish I could see braindb in bigger scale documented examples, and do you have "embeddings" and vectordb in mind, for e.g semantic search,... cause the name "braindb" suggests so.

@stereobooster
Copy link
Owner

stereobooster commented Jun 24, 2024

do you have "embeddings" and vectordb in mind, for e.g semantic search,... cause the name "braindb" suggests so.

even if it would happen. I assume it would be out of scope of core logic. One can write plugin to sync from BrainDB to any other storage (like, neo4j or vector db). BrainDB exposes events (delete, insert, update) - so developer can attach listeners and pipe data

I won't fall in the trap of trying to offer "a more fancy API" that ruins openness to any use case, anything can plug on top of sql.

The reason why I decided to hide SQL (at least for now), is because otherwise DB structure will become part of "public API". It will be harder to change it. Second reason is that I consider possibility to switch from SQLite to cozodb (graph database with datalog as query language)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants