-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats for GO-CAM models #2339
Comments
GO-CAM: By group:
By curator:
|
For me, to track pathway curation I'm primarily interested in coverage, so the number of genes covered by models; by model I'm referring to genes to be causally connected to another gene (not just a standard annotations, or a gene connected to an activity and a process). For example, the Reactome covers 11279 human proteins. https://reactome.org/about/statistics That's really useful to know. |
The two suggested statistics tally different things. Number of gene products with annotations of any sort says, sort of, what kind of coverage of the organism's genone is provided. The set of tallies earlier in the thread measure aspects of curator activity. |
Hi @pgaudet, I think these different propositions make sense. Statistics are essential to measure activity, but they should not be misused: the significant over-annotation that we observe from the last 20 years is mainly due to the tendency to make numbers at the expense of the quality. In my opinion, the real added value in GO-CAM is to connect genes together (or connect genes with small molecules). From that point of view, I would suggest to only consider high-quality models: those with connections, full annotation units/annotons (at least one MF and one BP) and evidences. Other annotations could be calculated as classic GO annotation. |
Pascale and I suggest that we first gather more specific requirements for Noctua statistics from curators and then we can come back to the software team. We'll plan for this discussion on an annotation call. |
Can we have a metric on the website about the number of genes in GO-CAM models (by species)
i.e. a non-redundant list of genes that are causally connected (obviously, some genes will be in multiple models), but it would be useful if we could have a way to quickly assess proteome coverage.
cc
@pgaudet
@vanaukenk
The text was updated successfully, but these errors were encountered: