Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO-CAM API / blazegraph query unable to handle large imported models #5

Closed
kltm opened this issue May 23, 2022 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@kltm
Copy link
Member

kltm commented May 23, 2022

It seems that the GO-CAM API query (as explained geneontology/api-gorest#3 (comment)) is not able to complete in the allotted 60s time on the current machine. Moreover, whatever it is doing with resources prevents other queries from running and often brings down the service, sometimes the blazegraph instance itself.

We are currently mitigating this with a decreased 30s timeout (down from 60s), which seems to prevent the MGI queries from overheating blazegraph (and at least the interface shows a "0", even if it's because of the error. As well, we still have the hourly "production" blazegraph restart in place, just in case.

Moving forward, we'll need to find a way to either

  1. Speed up the query or prevent it from running on overly large models (I.e. less than X nodes)
  2. Implement Implement new metadata tags in Noctua / GO-CAM universe to support distinguishing standard and causal models noctua#746 and get that into the0 GO-CAM API

Tagging @balhoff @dustine32 @sierra-moxon @vanaukenk @tmushayahama

@kltm kltm added the bug Something isn't working label May 23, 2022
@kltm
Copy link
Member Author

kltm commented May 24, 2022

Due to continued fluttering (about a dozen times since end-of-work yesterday), I have increased the restarts timing from 60m to 30m.

kltm referenced this issue May 24, 2022
Improved causal-by-GP query by Jim for geneontology/api-gorest#5
@kltm kltm transferred this issue from geneontology/api-gorest May 24, 2022
@kltm
Copy link
Member Author

kltm commented May 24, 2022

With a little testing, we can see that we are now having a new problem: may identifiers are causing 413 "payload too large" errors. For example:

wb/WBGene00004488
rgd/1564080
mgi/MGI:3781580
hgnc/16171
sgd/S000005537
flybase/FBgn0025334

@dustine32 reverting to older code for now, until we can work out with @balhoff what is going wrong.

kltm added a commit that referenced this issue May 25, 2022
Remove possibly excessive whitespace in an attempt to help with #5
@kltm
Copy link
Member Author

kltm commented May 25, 2022

Okay, I've done a bunch more testing here and would like to make the following notes:

  • while certain identifiers return the 413 error a lot, it is not consistent
    • when getting a 413, it tends to continue
    • sometimes it will start working (i.e. 200 and correct results); when this happens it tends to keep working (until it changes state again)
  • I'm no longer convinced that it's an issue with only apache2, put possibly in interacting with blazegraph; looking at debugging messages for the proxy, apache2 is apparently successfully connecting to blazegraph, with the 413 occurring later on; if it was only incoming to apache2 that was the problem, I would believe that the 413 would have occurred before the proxy making the connection

That said, debugging has not given much in the way of really understanding what is going on. As a "fingers crossed" approach and assuming that the issue is being caused by the literal query length, I've made an attempt to compress the query a little with #7 . Locally, I've now been unable to produce a 413 for some time, which was not previously the case. I could just be lucky right now, but I think this may be a "fix" for our current issue.

@kltm
Copy link
Member Author

kltm commented May 25, 2022

Okay, we've had about 20m, 200 queries, and no errors--which is a pretty huge success compared to where we've been operating since last Friday.
I think, unless something else comes up, we can close this one up.

Thank you for your help on this @balhoff and @dustine32 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

1 participant