Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing all submissions with specified fields times out. #44

Open
jonquet opened this issue Apr 16, 2018 · 6 comments
Open

Accessing all submissions with specified fields times out. #44

jonquet opened this issue Apr 16, 2018 · 6 comments

Comments

@jonquet
Copy link

jonquet commented Apr 16, 2018

In implementing the OMTD-share wrapper at:
ontoportal-lirmm/ncboproxy#3

We use the following REST call to access all the latest submissions and compile a zip file of their converted metadata description in XML.

data.bioontology.org/submissions?display=description,URI,identifier,version,homepage,documentation,contact,hasLicense,copyrightHolder,download,hasFormalityLevel,naturalLanguage,hasOntologyLanguage,reference,hasCreator,numberOfClasses,numberOfIndividuals,hasDomain,ontologiesRelatedTo,keywords,reference,publication,wasGeneratedBy,isBackwardsCompatibleWith,similarTo,hasPart,acronym,name,viewingRestriction,download,ontology,email

But the call times out most of the time. Except sometimes, when making the call right after the failure, when it is probably cached.

@graybeal
Copy link

A quick-and-dirty answer that I haven't had a chance to make sure is accurate:

it's a really heavyweight request across 800 files. I wonder if it isn't possible to request a much smaller subset at a time?

Also, using display_links=false is a more performant version of the REST call if you’re not interested in seeing the hypermedia links, which I think you're not.

@jvendetti
Copy link
Member

How often are you making this call? Calling the /submissions endpoint is expensive - last I checked we had over 14K submission objects in the triplestore. We don't utilize this endpoint anywhere from our UI.

@jonquet
Copy link
Author

jonquet commented Apr 17, 2018

Indeed the call shall be optimized with &display_links=false&display_context=false. Maybe @twktheainur already put those parameters.

Concerning the call, we will not do this call often. Once from time to time.
However, note the /submissions call only return the latest submission, not all of them.
At least this is like this on our appliances ;)

Which call do you guys use when populating the Browsepage ?

@graybeal
Copy link

Talking to Clement, he's pretty sure this call returns the most recent submissions only, at least as they are using it.

@twktheainur
Copy link

twktheainur commented Apr 23, 2018

@graybeal @jonquet @jvendetti To summarise.

The call always retrieves the latest submissions, not only as we are using it. The code in bioportal_web_ui, in the ontologies_controller, confirms that this call is used to populate the Browse page:

 # The attributes used when retrieving the submission. We are not retrieving all attributes to be faster
    browse_attributes = "ontology,acronym,submissionStatus,description,pullLocation,creationDate,released,name,naturalLanguage,hasOntologyLanguage,hasFormalityLevel,isOfType,contact"
    submissions = LinkedData::Client::Models::OntologySubmission.all(include_views: true, display_links: false, display_context: false, include: browse_attributes)

There is no particular reason why the call should create significantly more load than when someone visits the browse page.

Concerning the timeout, after investigation it turns out that it isn't a problem with BioPortal but with our reverse proxy configuration that had a low timeout threshold. The /submissions query actually executes quite fast (much less than 30 seconds) but since we are doing additional processing and a few head requests to check what ontologies are restricted for download (separated by a wait period of 1s in order to avoid any excessive load on the system), it takes significantly longer and results on a reverse-proxy timeout on our end.

I believe that this issue may be closed

@graybeal
Copy link

graybeal commented Apr 23, 2018

I agree. Thanks for the details and the extra care not to tire out our little system! ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants