Loading vocabulary terms is extremely slow #442

bindeali · 2022-06-02T09:49:17Z

Despite the loading query not changing, the load times over the past weeks/months have degraded to IMO unacceptable levels. For example, loading of a workspace with a single vocabulary, 111/2009, takes 2.6 minutes. This tracks with both the deployed version and my local version (4th gen Core i7) - the deployed version is faster, but not by much. The memory is not the problem, running this query takes significant CPU time despite the query nor the version of graphDB changing.

It would seem, then, that the culprit is the amount of data greatly increasing. However, the data in the workspace vocabulary contexts is not unreasonable nor is there any obvious bloat from OG or otherwise. Therefore, the first line of inquiry is whether the query can be optimized.

Also, it would be agreeable to include a larger amount of / more detailed loading information, just so the user knows the application isn't stuck.

The text was updated successfully, but these errors were encountered:

karelklima · 2022-06-06T16:11:44Z

My two cents regarding this issue, based on years of struggling with slow queries:

It is faster to send multiple simple queries then combine stuff in one query.
SPARQL distinct and optional constructs may slow queries significantly.
Queries should be sent in parallel whenever possible.
Data should be fetched when really needed, not sooner.

The reality is that SPARQL and all related technologies are really slow and not really usable for large databases, especially if there are complex queries involved.

Regarding the OntoGrapher case specifically - there is a lot that we can do. There is one issue that is apparent right away - the initial load of SSP cache. If you run the loading SELECT query, it returns all vocabularies, their terms and diagrams. I believe there is a lot of redundant data though. As of right now the query returns 78K rows. If you omit diagrams, it returns 11K rows. If you omit terms, it returns 300 rows. Seems like there is some cartesian product of terms and diagrams going on, the total data is about 80MB, which is a lot. I believe that partitioning the query will help significantly.

It's pretty clear that the current approach does not scale. I think it's OK to retrieve a list of vocabularies from the SSP cache, but I'd omit both diagrams and terms in the query since that is the kind of information that may not be needed initially.

With that said, I was not able to reproduce such crazy loading times - I tried OG on the dev instance and was able to open a workspace with the 111/2009 vocabulary in the matter of seconds.

bindeali added a commit that referenced this issue Sep 8, 2022

Issue #442

b80bffb

bindeali added this to the Stabilní a výkonný OG milestone Nov 28, 2023

bindeali removed this from the Stabilní a výkonný OG milestone Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading vocabulary terms is extremely slow #442

Loading vocabulary terms is extremely slow #442

bindeali commented Jun 2, 2022 •

edited

Loading

karelklima commented Jun 6, 2022

Loading vocabulary terms is extremely slow #442

Loading vocabulary terms is extremely slow #442

Comments

bindeali commented Jun 2, 2022 • edited Loading

karelklima commented Jun 6, 2022

bindeali commented Jun 2, 2022 •

edited

Loading