Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugSigDB 1.0 release #4

Closed
lgeistlinger opened this issue Jun 14, 2021 · 34 comments
Closed

BugSigDB 1.0 release #4

lgeistlinger opened this issue Jun 14, 2021 · 34 comments
Assignees

Comments

@lgeistlinger
Copy link
Contributor

Hi @jwokaty:

@lwaldron and I had started to discuss a release scheme for BugSigDB.

One idea was to follow Bioconductor's semi-annual release scheme, and have a stable release of BugSigDB signatures every half a year. We also discussed zenodo as the platform for hosting the stable release (= csv files for studies, experiments, and signatures).

A stable release is supposed to contain all reviewed content from BugSigDB up to a defined freeze date. For the BugSigDB 1.0 release this could encompass reviewed content up to the present date for simplicity, or if we wanted to synchronize with Bioconductor, up to the past 3.13 release date.

Would you like to go ahead and export the content, filter by date and review status, upload to zenodo, and include the stable release link under https://bugsigdb.org/Help:Export ?

Thanks!

@lgeistlinger
Copy link
Contributor Author

Note, functionality from our bugsigdbr package will likely be helpful here, including bugsigdbr::importBugSigDB for pulling the data frame that can then be filtered along the usual lines, bugsigdbr::getSignatures to extract the signatures of the filtered df, and bugsigdbr::writeGMT to write a GMT file containing the signatures.

@lwaldron
Copy link
Member

lwaldron commented Jun 14, 2021

I added a new organization secret ZENODO_DEPOSIT with an access token that should allow depositing through the Zenodo API from GitHub Actions in this repo, bugsigdbr, and bugphyzz. Zenodo also has an option for pulling releases directly from any release of a GitHub repo, although that would require either depositing a software + data repo, or creating separate data-only repos. Would be great if you can look into these options a bit @jwokaty then we can discuss the best way to do periodic data releases for these projects.

@lgeistlinger
Copy link
Contributor Author

Hi @jwokaty @lwaldron : I thought I quickly check in with you as to what's the status of this and whether input/help from my side is required here?

@jwokaty
Copy link
Collaborator

jwokaty commented Jul 1, 2021

@lgeistlinger I'd like to better understand what the release looks like. Will it be 3 csv files: one each for studies, experiments, and signatures? Or will there be different versions in content or file formats?

There's also a place where I can add metadata. I'll try to write it using the website and the repository, but maybe I need some help with the following:

Are there any specific keywords should we associate with the data? This isn't required, but may help users find it.

  • Is there a license associated with this data? Zenodo lists the following licenses (but I think we can do something else):
    Creative Commons Attribution 4.0 International
    Creative Commons Attribution 1.0 Generic
    Creative Commons Attribution 2.0 Generic
    Creative Commons Attribution 3.0 Unported
    Creative Commons Attribution 3.0 Austria
    Creative Commons Attribution 3.0 United States

@lgeistlinger
Copy link
Contributor Author

Thanks for checking in @jwokaty.

Will it be 3 csv files: one each for studies, experiments, and signatures?

That's the core, yes. This is also what I think what bugsigdbr will pull at one point from zenodo. My thought would be that

bugsigdbr::importBugSigDB

will get a version argument so once the 1.0 release is out on zenodo, users could pull the stable release via:

bugsigdbr::importBugSigDB(version = "1.0")

Currently it directly pulls the bloody edge including unreviewed content from https://bugsigdb.org/Help:Export, which would then become available via:

bugsigdbr::importBugSigDB(version = "devel")

When preparing the three csv files for the 1.0 release, it's important to restrict to reviewed contents only. Let me know whether you have questions or whether it's easier if I provide these files.

In addition, I think we want to supplement the core release with GMT files containing the actual signatures.
There are many ways to extract signatures based on taxonomic considerations and bugsigdbr::getSignatures implements a bunch of those options. For non R-users, we however want to provide at least 3 GMT files I think:

  • signatures containing mixed taxonomic levels,
  • signatures harmonized on genus level
  • (signatures harmonized on species level)

@lwaldron let us know if you have thoughts on this.

Are there any specific keywords should we associate with the data? This isn't required, but may help users find it.

A couple that come to mind:

  • human microbiome
  • differential abundance
  • gene set enrichment analysis

Is there a license associated with this data? Zenodo lists the following licenses (but I think we can do something else):

@lwaldron might have opinions. I'd be good with Creative Commons Attribution 4.0 International.

@lgeistlinger lgeistlinger pinned this issue Jul 2, 2021
@lwaldron
Copy link
Member

lwaldron commented Jul 2, 2021

@lgeistlinger would you write a simple .R script that dumps all the required files? I guess it'll be a few: gmt, cab, ncbi ID, names, genus, species, mixed. No need to put versioning or dates in file names, but maybe a comment line in the first line with the date, license, and reference to bugsigdb.org?

@lgeistlinger
Copy link
Contributor Author

Sure.

@lgeistlinger lgeistlinger self-assigned this Jul 2, 2021
@lgeistlinger
Copy link
Contributor Author

That is basically done and part of bugsigdbr now:

https://github.com/waldronlab/bugsigdbr/blob/main/inst/scripts/dump_release.R

Call the script via: Rscript dump_release.R <version> <output.directory>

which will produce the following output files:

full_dump.tab 
bugsigdb_signatures_mixed_metaphlan.gmt
bugsigdb_signatures_mixed_ncbi.gmt
bugsigdb_signatures_mixed_taxname.gmt
bugsigdb_signatures_genus_metaphlan.gmt		
bugsigdb_signatures_genus_metaphlan_exact.gmt	
bugsigdb_signatures_genus_ncbi.gmt		
bugsigdb_signatures_genus_ncbi_exact.gmt	
bugsigdb_signatures_genus_taxname.gmt		
bugsigdb_signatures_genus_taxname_exact.gmt
bugsigdb_signatures_species_metaphlan.gmt
bugsigdb_signatures_species_metaphlan_exact.gmt
bugsigdb_signatures_species_ncbi.gmt
bugsigdb_signatures_species_ncbi_exact.gmt
bugsigdb_signatures_species_taxname.gmt
bugsigdb_signatures_species_taxname_exact.gmt

Pending waldronlab/BugSigDB#92 a filter by review status will be incorporated.

@jwokaty
Copy link
Collaborator

jwokaty commented Jul 27, 2021

@lgeistlinger I am working on this at https://github.com/jwokaty/BugSigDBExports, which I will transfer to waldronlab when it's in a good state. I've created a GitHub action that we can run manually to generate exports with dump_release.R.

@lwaldron and I had discussed setting up the action to do a daily export that would be committed to BugSigDBExports. We would do a manual release to get into Zenodo.

However, the version number that is passed to dump_release is just a label, right? I thought if possible it might be better to associate the files with a date corresponding to reviewed content rather than a version number. When we do a release to Zenodo, there will still be a version number, but I think we want to be able to reproduce the files whether we get it from BugSigDBExports, Zenodo, or bugsigdbr.

@lgeistlinger
Copy link
Contributor Author

I think that is great.

However, the version number that is passed to dump_release is just a label, right?

This is correct. The only place where the version argument is used in dump_release.R is the header line of the output files, eg here:

# BugSigDB 0.0.1, License: Creative Commons Attribution 4.0 International, URL: https://bugsigdb.org

If you call the script with a slight notational abuse and provide a date instead of a version number to dump_release.R, the header line will accordingly change to eg:

# BugSigDB 2021-07-27, License: Creative Commons Attribution 4.0 International, URL: https://bugsigdb.org

and the argument can then also be renamed to date in the script.

@lwaldron
Copy link
Member

It looks great! Nice use of the .zenodo.json file too.

@lwaldron
Copy link
Member

lwaldron commented Jul 27, 2021

And BTW I think the Zenodo releases can just have a date within too, releases will have a version number from the tag but the files contain dates as usual.

@jwokaty
Copy link
Collaborator

jwokaty commented Jul 28, 2021

We are still waiting on waldronlab/BugSigDB#92 to close before we publish because this may change the content, correct? I've transferred the BugSigDBExports to waldronlab and have scheduled it to do weekly exports on Sunday using dates. The first will be this Sunday. Does this issue really belong to the new repository? I can transfer it.

@lwaldron
Copy link
Member

Yes, this issue does belong in https://github.com/waldronlab/BugSigDBExports.

@lwaldron lwaldron transferred this issue from waldronlab/BugSigDB Jul 29, 2021
@lwaldron
Copy link
Member

Sorry for being impatient and just transferring it, was only so I could refer to it from waldronlab/BugSigDB#92 (although I just learned that those references get automatically updated with the transfer!)

@lwaldron
Copy link
Member

I set up the Zenodo integration and did a little debugging trying to get it to work (https://zenodo.org/account/settings/github/repository/waldronlab/BugSigDBExports#) but am stuck now with the following error on Zenodo:

{
    "errors": "Something went wrong when we tried to publish your release. If your release has not been published within the next hour, please contact us via our support form to resolve this issue."
}

Let's see if it fixes itself within the next hour, otherwise I'll contact the Zenodo support team. It's very particular about the .zenodo.json file.

@lgeistlinger
Copy link
Contributor Author

lgeistlinger commented Sep 12, 2021

@jwokaty @lwaldron quick update: I incorporated a filter for complete content in the dump release script, so the only thing remaining for the 1.0 release to zenodo is to clean up the ontology columns (waldronlab/BugSigDB#92 (comment)). Will work on that so that we get that through the door prior to the October Bioc release.

@jwokaty
Copy link
Collaborator

jwokaty commented Oct 19, 2021

Hi @lgeistlinger, there's an issue with automatically releasing to Zenodo, so we should do a manual release of the files. (The last I heard from them, they were still working on it last week.) What is the date that we will do the release, Oct. 25? And do want to modify https://github.com/waldronlab/bugsigdbr to get data from Zenodo (along with the bleeding edge)?

@lgeistlinger
Copy link
Contributor Author

Thanks @jwokaty. It's a great point. @lwaldron any chance you've heard from Ike with regard to progress on this? Looks like the Oct 25 deadline is a bit in danger, although it would still be great to make it :-)

@lgeistlinger
Copy link
Contributor Author

And do want to modify https://github.com/waldronlab/bugsigdbr to get data from Zenodo (along with the bleeding edge)?

And yes, that is what I think we are aiming for. Being able to pull the zenodo release (stable) as well as the continously updated version (bleeding edge) from BugSigDBExports as we do it currently, if that makes sense.

@jwokaty jwokaty linked a pull request Oct 28, 2021 that will close this issue
@jwokaty jwokaty removed a link to a pull request Oct 28, 2021
@jwokaty
Copy link
Collaborator

jwokaty commented Oct 28, 2021

Closed by #12. I manually did the release on Zenodo at https://zenodo.org/record/5606166 since the automatic mechanism still isn't working. (I should have removed the README, but I you can't change the files after publishing.)

@jwokaty jwokaty closed this as completed Oct 28, 2021
@lgeistlinger
Copy link
Contributor Author

lgeistlinger commented Oct 28, 2021

Thanks @jwokaty. That is great. I think we might have jumped the gun here a little bit though as the first release is still waiting on the fix of the ontology columns in the export. This needs to be fixed by Ike first before we can go ahead and do our first official release.

@lgeistlinger lgeistlinger reopened this Oct 28, 2021
@jwokaty
Copy link
Collaborator

jwokaty commented Oct 28, 2021

Thanks for clarifying.

@lgeistlinger
Copy link
Contributor Author

Hi @jwokaty @lwaldron : this is finally ready for release! we finished the ontology columns in the export and everything is looking good now for upload of the stable BugSigDB 1.0 release to zenodo. @jwokaty can you go ahead and perform the upload to zenodo? (Not sure whether this will involve overwriting your previous upload under https://zenodo.org/record/5606166, or whether we bump this to 1.0.1 then). Thanks!

@lgeistlinger
Copy link
Contributor Author

The release should be accordingly based on the latest export: 1137470

@jwokaty
Copy link
Collaborator

jwokaty commented Dec 27, 2021

@lgeistlinger We have to bump the version to 1.0.1. I just want to check if there should be a specific release title and any description for the release before I create the release. Also, would you like me to 'draft' the upload in Zenodo so that you can take a look before I finalize everything?

@lgeistlinger
Copy link
Contributor Author

Thanks @jwokaty! I noticed a small inconvenience in the bulk export from bugsigdb.org, with some conditions / body sites being present in upper case and lower case (eg "Feces" and "feces"). I introduced a small fix for that in 121c571. Can you trigger a manual export for that and base the 1.0.1 on this export?

if there should be a specific release title and any description for the release before I create the release

Nothing specific here from my side.

Also, would you like me to 'draft' the upload in Zenodo so that you can take a look before I finalize everything?

That sounds like a good idea!

@lgeistlinger
Copy link
Contributor Author

Hi @jwokaty - I just took a look at yesterday's export (3def7c9) and everything looks good for upload / release to zenodo. Just let me know if you have any questions. Many thanks!

@jwokaty
Copy link
Collaborator

jwokaty commented Jan 5, 2022

@lgeistlinger I've drafted the new version at https://zenodo.org/deposit/5819260. (I am assuming that you can see it.)

@lgeistlinger
Copy link
Contributor Author

lgeistlinger commented Jan 5, 2022

Thanks @jwokaty , logging in to zenodo via Github (lgeistlinger / [email protected]), I am seeing:

Permission required: You do not have sufficient permissions to view this page.

when trying to access the link you provided.

@jwokaty
Copy link
Collaborator

jwokaty commented Jan 5, 2022

I apologize. I thought maybe because we had access to the same thing that maybe we could all see the draft. Maybe there's no way for you to see it? I just updated all the files, except for the README, and then updated the version number to v1.0.1.

@lgeistlinger
Copy link
Contributor Author

I just updated all the files, except for the README, and then updated the version number to v1.0.1.

Cool! Can you maybe share the files via google drive or dropbox with me to quickly review them on my end. Thanks!

@jwokaty
Copy link
Collaborator

jwokaty commented Jan 6, 2022

The files are the same as in this release: https://github.com/waldronlab/BugSigDBExports/releases/tag/v1.0.1

@lgeistlinger
Copy link
Contributor Author

Ah very nice, somehow I didn't notice the releases folder/branch. Cool, I'd say good to go to upload to zenodo and closing this issue.

@jwokaty jwokaty closed this as completed Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants