Graph (DBs) as the data model #1537

webmutation · 2022-04-11T09:15:47Z

webmutation
Apr 11, 2022

This has probably been asked before but I could not find it. I watched the video of the 4.x release and one of the goals mentioned was that DT should be able to scale to hundreds of thousands of dependencies... looking just at our nodejs projects that have about 100.000+ dependencies this seems reasonable... this seems like the perfect fit to use a Graph databases. I was wondering if this was considered or discarded as an option.

stevespringett · 2022-04-11T18:18:46Z

stevespringett
Apr 11, 2022
Maintainer

Yes, graph databases were considered. I was especially fond of Neo4J, however the GPLv3 is toxic to many organizations.

On the surface, ArangoDB seems ideal as it can serve as a relational or graph database. It's also Apache-2.0 licensed. I have zero experience with it and wasn't about to introduce something I'm just learning about to thousands of orgs using DT in production.

Upgrading was also a challenge. There was a need to migrate from the v3.x data model to the v4.x data model. Keeping it in a RDBMS made the migration a little less painless.

However, graph databases are not off the table, especially if we can get volunteers to contribute time to design, proper implementation, performance and security optimizations, and migration of existing data.

1 reply

webmutation Apr 27, 2022
Author

Thanks for your comments. I do have some experience with neo4j, and looking at the data model a graph db immediatly comes to mind. Indeed the license may be problematic...

I never got to work with ArangoDB but it has crossed my radar. I will try to look a bit deeper and run some tests. Seems like an interesting activity

ruckc · 2022-04-11T18:38:11Z

ruckc
Apr 11, 2022

I don't think the performance scalability issues are from the graph vs relational concern. PostgreSQL should be able to easily scale. The biggest issue on scalability is the task that updates the main dashboard's metrics. It doesn't seem to scale well past 1000 projects very well, as it's single threaded and it ends up triggering analysis tasks inline for every single project/component.

2 replies

webmutation Apr 27, 2022
Author

Ah that is an interesting bottleneck. Is it documented anywhere? Would be very interesting to read more about this issue

nscuro Apr 29, 2022
Maintainer

@webmutation Some of it is documented in #1210.

It's an issue we're currently doing research on as to how to best resolve it. We've started by improving the current implementation, which is a good first step. Watch this space™️

nscuro · 2023-02-05T11:17:25Z

nscuro
Feb 5, 2023
Maintainer

I've been looking at, and experimenting with, ArangoDB for a personal project of mine. The big benefit I see over competitors like Neo4j is that it's not limited to just graphs, it can also be used as a standard document database, and even has a builtin search engine for full-text search. The Java API is very pleasant to work with, super minimalist.

As a NoSQL database it can be scaled horizontally. However my main motivation for adopting a graph (capable) database in DT would be that the data model would come more natural to the domain we're in.

A problem that many NoSQL databases present though is that there's no or very limited offerings for managed cloud services. Managed instances of ArangoDB and Neo4j are only available through the respective companies behind them. MySQL, PostgreSQL and SQL Server you'll be able to find managed instances on pretty much every cloud provider.

It also turns out that modelling and querying graph structures in SQL is¹ entirely² possible³. Problem is that ORMs like DataNucleus do not support CTEs, so it forces users to drop into raw SQL. However, that's the case for most graph databases, too. Writing raw queries is not a bad thing. From what I was able to find so far WRT benchmarks⁴⁵⁶, Postgres performs better or just as good as native graph databases, while maintaining a much smaller resource footprint.

I'd argue that DT dropping support for H2, MySQL, and SQL Server, and only supporting Postgres would provide a higher ROI than switching to a different database technology altogether. To start with, "migration" would not be as much of a problem. Postgres provides lots of features that can boost efficiency, that we cannot use today because we have to support other RDBMSes as well. Using stored procedures, materialized views etc. would be much more viable if we didn't have to support >3 different SQL dialects.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph (DBs) as the data model #1537

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Graph (DBs) as the data model #1537

webmutation Apr 11, 2022

Replies: 3 comments · 3 replies

stevespringett Apr 11, 2022 Maintainer

webmutation Apr 27, 2022 Author

ruckc Apr 11, 2022

webmutation Apr 27, 2022 Author

nscuro Apr 29, 2022 Maintainer

nscuro Feb 5, 2023 Maintainer

Footnotes

webmutation
Apr 11, 2022

Replies: 3 comments 3 replies

stevespringett
Apr 11, 2022
Maintainer

webmutation Apr 27, 2022
Author

ruckc
Apr 11, 2022

webmutation Apr 27, 2022
Author

nscuro Apr 29, 2022
Maintainer

nscuro
Feb 5, 2023
Maintainer