Graph (DBs) as the data model #1537
Replies: 3 comments 3 replies
-
Yes, graph databases were considered. I was especially fond of Neo4J, however the GPLv3 is toxic to many organizations. On the surface, ArangoDB seems ideal as it can serve as a relational or graph database. It's also Apache-2.0 licensed. I have zero experience with it and wasn't about to introduce something I'm just learning about to thousands of orgs using DT in production. Upgrading was also a challenge. There was a need to migrate from the v3.x data model to the v4.x data model. Keeping it in a RDBMS made the migration a little less painless. However, graph databases are not off the table, especially if we can get volunteers to contribute time to design, proper implementation, performance and security optimizations, and migration of existing data. |
Beta Was this translation helpful? Give feedback.
-
I don't think the performance scalability issues are from the graph vs relational concern. PostgreSQL should be able to easily scale. The biggest issue on scalability is the task that updates the main dashboard's metrics. It doesn't seem to scale well past 1000 projects very well, as it's single threaded and it ends up triggering analysis tasks inline for every single project/component. |
Beta Was this translation helpful? Give feedback.
-
I've been looking at, and experimenting with, ArangoDB for a personal project of mine. The big benefit I see over competitors like Neo4j is that it's not limited to just graphs, it can also be used as a standard document database, and even has a builtin search engine for full-text search. The Java API is very pleasant to work with, super minimalist. As a NoSQL database it can be scaled horizontally. However my main motivation for adopting a graph (capable) database in DT would be that the data model would come more natural to the domain we're in. A problem that many NoSQL databases present though is that there's no or very limited offerings for managed cloud services. Managed instances of ArangoDB and Neo4j are only available through the respective companies behind them. MySQL, PostgreSQL and SQL Server you'll be able to find managed instances on pretty much every cloud provider. It also turns out that modelling and querying graph structures in SQL is1 entirely2 possible3. Problem is that ORMs like DataNucleus do not support CTEs, so it forces users to drop into raw SQL. However, that's the case for most graph databases, too. Writing raw queries is not a bad thing. From what I was able to find so far WRT benchmarks456, Postgres performs better or just as good as native graph databases, while maintaining a much smaller resource footprint. I'd argue that DT dropping support for H2, MySQL, and SQL Server, and only supporting Postgres would provide a higher ROI than switching to a different database technology altogether. To start with, "migration" would not be as much of a problem. Postgres provides lots of features that can boost efficiency, that we cannot use today because we have to support other RDBMSes as well. Using stored procedures, materialized views etc. would be much more viable if we didn't have to support >3 different SQL dialects. Footnotes
|
Beta Was this translation helpful? Give feedback.
-
This has probably been asked before but I could not find it. I watched the video of the 4.x release and one of the goals mentioned was that DT should be able to scale to hundreds of thousands of dependencies... looking just at our nodejs projects that have about 100.000+ dependencies this seems reasonable... this seems like the perfect fit to use a Graph databases. I was wondering if this was considered or discarded as an option.
Beta Was this translation helpful? Give feedback.
All reactions