-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📫Implement /tpm/gene-all-cancer/json
and /tpm/gene-all-cancer/plot
API endpoints
#20
Conversation
Add section numbers.
Changed minimum number of samples per `Disease` or `GTEx_tissue_subgroup` from 1 to 3.
- Changed "cohort =" to "Dataset =" in boxplot and summary table x-axis labels. - Changed "cohort" to "Dataset" in boxplot summary table columns.
Added curl tests for EFO updates in v9 release.
Configure markdown-lint rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jharenza asked me to take a look at this implementation of R for handling an API, so I'm only looking at src/plumber.R
.
The main point i want to draw attention to that i touch upon in an individual comment:
I'd take a look at response times and how long it takes for R to formulate responses. Obviously, the test endpoints are quicker than the endpoints that return information about "real" data, but i'd be curious about how long the /plot
endpoints take to build plots, compared to eachother and compared to the /json
endpoints. Plotting - particularly with ggplot - isn't the fastest operation. So, as i mention in one of my comments, I'd consider wrapping the plot endpoint functions in promises::future_promise()
, so that the server doesn't get overloaded with Plot requests. Obviously - take a look at the data first to see if this is actually an issue. Looking at the response times from curl-test-endpoints, it looks like responses take about 2 - 3 seconds for the tpm/gene-disease-gtex/plot endpoint, so i may consider starting there
|
||
gene_tpm_boxplot_tbl <- get_gene_tpm_boxplot_tbl(gene_tpm_tbl) | ||
|
||
res_plot <- get_gene_tpm_boxplot(gene_tpm_boxplot_tbl) | ||
|
||
print(res_plot) | ||
} | ||
|
||
|
||
|
||
# Testing endpoints ------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably get rid of these testing endpoints soon, depending on what the purpose of having the testing endpoints is.
If the purpose is to test that the endpoint can run r code, then i'd drop the /echo
endpoint and rename the /plot
endpoint to something that makes it clear that a plot of random data is being returned, such as /random_plot
.
If the purpose is to test if the endpoint can be reached by the client, perhaps something like /status
(or /session_info
, that returns information about the environment e.g.
#* Return information about the R environment of the server
#*
#* @tag "R system information"
#* @get /session_info
function() {
sessionInfo()
}
I think this approach has the added benefit of giving the user useful information that
- may otherwise be hard to find
- may change over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We will discuss with the users of this API to see whether they still need the testing endpoints.
Would exposing R environment sessionInfo()
be a security issue? cc @blackdenc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-s-friedman I have discussed with @jonkiky, who is one of the users of this API. @jonkiky suggested to keep the /echo
endpoint for health check with minimal resources, and remove the other two testing endpoints, i.e. /plot
and /sum
.
Implemented in acb0b6f.
@@ -80,7 +96,8 @@ function(ensemblId, efoId) { | |||
#* @get /tpm/gene-disease-gtex/plot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since producing plots can take a while, i'd consider attempting to parallelize the plot endpoints so that plot requests don't eat up all the processes. See the plumber 1.1.0 release notes and the article those notes point to.
Granted, the plot endpoint will take longer to send responses to users because there's more to send in plots, but still somthing to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you submit an issue for this suggestion with more implementation details, or submit a PR for implementing this?
I think this feature would be more properly implemented as a standalone PR, because it is not trivial, and we are planning to merge this PR in the next few days. I think implementing this feature would take at least the following steps:
- Read the documentations of
promises
package andplumbr
parallel execution model completely. I am confused byplumbr
documentation claiming that requests are handled sequentially at https://www.rplumber.io/articles/execution-model.html#performance-request-processing. Could you specify theplumbr
parallel execution model in the upcoming issue/PR? - Check source code if documentations are unclear.
- Check whether the R functions can actually be wrapped with
promises::future_promise
and still return the correct results. - Implement this feature.
- Test this feature. Check whether requests are actually executed concurrently.
- Discuss with DevOps team how this would affect previously designed Amazon Fargate/ECS/etc scaling procedure.
- Document this feature:
- How does it work?
- Why does it work correctly? How to tell whether it is appropriate and beneficial to wrap a function with
promises::future_promise
. - How does it interact with the deployment level scaling?
- Other relevant specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough - yeah this may be too big for this pr. I'm wondering if @blackdenc may be able to speak to any load balancing or anything on the infra side to handle concurrent requests ...
@chris-s-friedman Thank you for the review.
Could you also review all other files and test run data building process and plumbr server? The
@blackdenc @chinwallaa and I are planning to evaluate the response time with load test on the dev server after #21 is resolved, because implementing #21 may change the response time and handling procedure significantly. However, we could start relevant discussions, as #21 will probably be implemented by the end of next week. Could you discuss with @blackdenc on how to implement the load test specifically to get the benchmarks you need, as @blackdenc is going to implement the load test? I will get back to your specific comments in the threads. |
Suggested by @chris-s-friedman.
Suggested by @chris-s-friedman and @jonkiky.
@chris-s-friedman - Thank you for reviewing. This PR will be merged soon. Ongoing discussions will be followed up in #25, #26 and #27. |
Code and Results ReviewEverything looks good! No errors that I can find
|
Pull Request Template
Description
Implemented the HTTP
GET
methods of/tpm/gene-all-cancer/json
and/tpm/gene-all-cancer/plot
endpoints. These two endpoints handle HTTP requests for OpenPedCan-analysis pan-cancer_group
boxplot and summary table, according to the API specifications in https://nih.box.com/s/5cq2jwi6bhg0mgnowad3e6e4i60hwbnr. The changes will be deployed on the dev server at https://openpedcan-api-dev.d3b.io/__docs__/.cohort
in boxplot x labels and summary table columns toDataset
#14/tpm/gene-all-cancer/plot
and/tpm/gene-all-cancer/json
endpoint GET methods #15Disease
orGTEx_tissue_subgroup
to 3 #17Type of change
Changed
OpenPedCan-analysis
v9 release data./tpm/gene-disease-gtex
boxplot title from "Disease vs. GTEx tissue bulk gene expression" to "Primary tumor vs GTEx tissue bulk gene expression".Disease
orGTEx_tissue_subgroup
from 1 to 3.tests/curl_test_endpoints.sh
variableAPI_PORT
toLOCAL_API_HOST_PORT
.README.md
.Added
/tpm/gene-all-cancer/json
API endpoint./tpm/gene-all-cancer/plot
API endpoint.API_HOST
variable intests/curl_test_endpoints.sh
, in order to test DEV and QA hosts.changelog.md
.How Has This Been Tested?
Integration test
Environment:
For more details on test options and resources, see https://github.com/PediatricOpenTargets/OpenPedCan-api/tree/logstar/gene-all-cancer#3-test-run-openpedcan-api-server-locally.
Terminal returns:
Checklist