Skip to content

Update requests

Fabio Cumbo edited this page Jun 7, 2021 · 1 revision

This section summarises the steps required to submit new genomes into the MetaRefSGB database:

  1. Fork the MetaRefSGB-db repository by running git clone https://github.com/SegataLab/MetaRefSGB-db in your terminal;
  2. Put your update definition file inside the updates folder. Look at the section below for additional information about how to define an update request;
  3. Open a pull request. Your updates will be reviewed;
  4. If your pull request will be positively reviewed and accepted, your updates will be merged into the updates area unless the process of building new SGB releases will be triggered.

Useful Questions

1. How do I start submitting an update?

Have a look at this model.

Make a copy of that file and fill all the mandatory fields (it is also highly recommended to fill the optional fields). Finally rename your file following the syntax <YourGitHubID>.yaml (with no < and > symbols). Your final update description file should have no comments and follow the order in the model.

Remember to have a look at the MetaRefSGB Data Model to understand how the data linked in the YAML file must be formatted and validated before the submission of new update requests.

If there are details you are not sure about please open an issue. The MetaRefSGB maintainers will be happy to answer your questions.

2. How do I compress my update data?

We accept bz2 compressed data only.

To compress your data using the bz2 compression method under a Linux or Mac environment, you can execute the following command line bzip2 -f <your-file>.

If the bzip2 utility is not available on your Lixux environment, you have to install it with:

$ sudo apt install bzip2     [On Debian/Ubuntu] 
$ sudo yum install bzip2     [On CentOS/RHEL]
$ sudo dnf install bzip2     [On Fedora 22+]

Under a Mac environment, you can use Homebrew to install the latest version of bzip2 with: brew install bzip2.

Otherwise, if you are working on a Windows environment, please follow the bzip2 installation guidelines on http://gnuwin32.sourceforge.net/packages/bzip2.htm.

3. Where do I have to store my compressed data?

You can store your bz2 compressed data on any public accessible location. Three examples of valid URLs to files hosted on different services are listed below:

  1. if you host your bz2 compressed files on Dropbox, you should provide us a link structured like the following one https://www.dropbox.com/s/50rnivzsvi1v3nn/my.MAGs.txt.bz2?dl=1. Do not forget to set the ?dl=1 at the end of the link;

  2. if the file is hosted on Google Drive, the link to your data is structured like the following one https://drive.google.com/file/d/1FSqkEVlP92gSnRfyiO0vfNEaCPavqJnw;

  3. generically, your data could be hosted on any other services that do not require any additional procedure or interaction to proprietary APIs.

Do not forget to set the host type under the field host that can assume only one of the following values: dropbox, gdrive, or web related to the above mentioned scenarios 1., 2., and 3. respectively.

Please be sure that the service on which your data are stored is not under a VPN. Otherwise, we will not be able to retrive your data.

4. How do I populate the hash fields?

You can generate a sha256 hash from the command line on Linux (and Mac if you install the necessary tools below).

To generate a sha256 hash: openssl sha256 <your-file>.bz2

You may need the openssl package which you can easily install with the Conda package manager. After installing Conda, open the command prompt and install the openssl package: conda install openssl -c conda-forge

If you are under Windows, you can generate a sha256 hash with the following command line certutil -hashfile <your-file>.bz2 SHA256 on the Windows Command Prompt or through the Windows PowerShell with the following cmdlet Get-FileHash -Path <your-file>.bz2 -Algorithm SHA256.

5. Why my pull request has been accepted but there is still not a MetaRefSGB release with my genomes?

Even if your update request has been accepted, new releases are built periodically. Please, be patient until one of the MetaRefSGB maintainers will process your data and build a new release.