Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading vocabulary terms is extremely slow #442

Open
bindeali opened this issue Jun 2, 2022 · 1 comment
Open

Loading vocabulary terms is extremely slow #442

bindeali opened this issue Jun 2, 2022 · 1 comment

Comments

@bindeali
Copy link
Collaborator

bindeali commented Jun 2, 2022

Despite the loading query not changing, the load times over the past weeks/months have degraded to IMO unacceptable levels. For example, loading of a workspace with a single vocabulary, 111/2009, takes 2.6 minutes. This tracks with both the deployed version and my local version (4th gen Core i7) - the deployed version is faster, but not by much. The memory is not the problem, running this query takes significant CPU time despite the query nor the version of graphDB changing.

It would seem, then, that the culprit is the amount of data greatly increasing. However, the data in the workspace vocabulary contexts is not unreasonable nor is there any obvious bloat from OG or otherwise. Therefore, the first line of inquiry is whether the query can be optimized.

Also, it would be agreeable to include a larger amount of / more detailed loading information, just so the user knows the application isn't stuck.

@karelklima
Copy link
Contributor

My two cents regarding this issue, based on years of struggling with slow queries:

  • It is faster to send multiple simple queries then combine stuff in one query.
  • SPARQL distinct and optional constructs may slow queries significantly.
  • Queries should be sent in parallel whenever possible.
  • Data should be fetched when really needed, not sooner.

The reality is that SPARQL and all related technologies are really slow and not really usable for large databases, especially if there are complex queries involved.

Regarding the OntoGrapher case specifically - there is a lot that we can do. There is one issue that is apparent right away - the initial load of SSP cache. If you run the loading SELECT query, it returns all vocabularies, their terms and diagrams. I believe there is a lot of redundant data though. As of right now the query returns 78K rows. If you omit diagrams, it returns 11K rows. If you omit terms, it returns 300 rows. Seems like there is some cartesian product of terms and diagrams going on, the total data is about 80MB, which is a lot. I believe that partitioning the query will help significantly.

It's pretty clear that the current approach does not scale. I think it's OK to retrieve a list of vocabularies from the SSP cache, but I'd omit both diagrams and terms in the query since that is the kind of information that may not be needed initially.

With that said, I was not able to reproduce such crazy loading times - I tried OG on the dev instance and was able to open a workspace with the 111/2009 vocabulary in the matter of seconds.

bindeali added a commit that referenced this issue Sep 8, 2022
@bindeali bindeali added this to the Stabilní a výkonný OG milestone Nov 28, 2023
@bindeali bindeali removed this from the Stabilní a výkonný OG milestone Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants