diff --git a/CONTRIBUTORS.yaml b/CONTRIBUTORS.yaml index 0c6a4654c80693..34dccefc7c6b12 100644 --- a/CONTRIBUTORS.yaml +++ b/CONTRIBUTORS.yaml @@ -920,6 +920,11 @@ lamouresparus: twitter: lamouresparus joined: 2021-10 +Laura190: + name: Laura Cooper + email: L.Cooper.5@warwick.ac.uk + joined: 2023-08 + lecorguille: name: Gildas Le Corguillé joined: 2017-09 diff --git a/topics/fair/tutorials/bioimage-REMBI/tutorial.md b/topics/fair/tutorials/bioimage-REMBI/tutorial.md new file mode 100644 index 00000000000000..7fa3d5ca08d139 --- /dev/null +++ b/topics/fair/tutorials/bioimage-REMBI/tutorial.md @@ -0,0 +1,418 @@ +--- +layout: tutorial_hands_on +title: REMBI - Recommended Metadata for Biological Images – metadata guidelines for bioimaging data + +zenodo_link: '' + +questions: +- What is REMBI and why should I use it? +- What information should be included when collecting bioimage data? + +objectives: +- Organise bioimage metadata +- Find out what REMBI is and why it is useful +- Categorise what metadata belongs to each of the submodules of REMBI +- Gather the metadata for an example bioimage dataset + +time_estimation: "15m" + +key_points: +- REMBI describes useful guidelines for bioimaging that can help unification and FARIfication of the data. + +tags: +- fair +- data management +- bioimaging + +priority: 5 + +contributions: + authorship: + - wee-snufkin + - Laura190 + - kkamieniecka + - poterlowicz-lab + +subtopic: fair-data + +requirements: + - type: "internal" + topic_name: fair + tutorials: + - fair-intro + - data-management + - bioimage-metadata + +follow_up_training: + - type: "internal" + topic_name: imaging +--- + +# Metadata guidelines for bioimaging data + +REMBI (Recommended Metadata for Biological Images) was proposed as a draft metadata guidelines to begin addressing the needs of diverse communities within light and electron microscopy. Currently, these guidelines are in draft form to encourage discussion within the community, but they provide a useful guide as to what metadata should be gathered to make your image data FAIR. They divide the metadata requirements into eight modules which further split into attributes - that seems to be a daunting task, doesn't it? But at the same time it's exciting news for the community! To find out more, have a look at the [REMBI article](https://www.nature.com/articles/s41592-021-01166-8). + + +> +> +> In the [REMBI paper](https://www.nature.com/articles/s41592-021-01166-8), the authors consider three potential user groups who require different metadata. Find out what are these three groups and their metadata requirements. +> +> > +> > The identified three user groups are: Biologists, Imaging scientists, Computer-vision researchers. +> > - A research biologist may be interested in the biological sample that has been imaged to compare it to similar samples that they are working with. +> > - An imaging scientist may be interested in how the image was acquired so they can improve upon current image acquisition techniques. +> > - A computer vision researcher may be interested in annotated ground-truth segmentations, that can be obtained from the image, so they can develop faster and more accurate algorithms. +> {: .solution} +> +{: .question} + +> Instructor Note +> +> If you're an instructor leading this training, you might ask people to work in small groups for this exercise and encourage the discussion. Ask group members to share which of the user groups they identify as and what metadata they would want. +> +{: .tip} + +# Categories of metadata +REMBI covers different categories of metadata, such as: +- study +- study component +- biosample +- specimen +- image acquisition +- image data +- image correlation +- analyzed data + +Within each module, there are attributes that should be included to make the published data FAIR. We will explore all the modules and attributes suggested by REMBI and we'll show some examples as well. + +## Study +The first module of REMBI metadata describes the Study and should include: +- Study type +- Study description +- General dataset information + +### Study type + +Ideally, the study type will be part of an ontology. You can look up the main subject of your study using a tool like [OLS](https://www.ebi.ac.uk/ols/index) to find a suitable ontology. This will help others to see where your study sits within the wider research area. + +> Example +> +> +> +> +> +> +>
Study typeRegulation of mitotic cell division
+> +{: .comment} + + +### Study description + +A brief description of the project. The Study Description should include the title of the study, a brief description and any related publication details such as authors, title and DOI. If you are gathering metadata prepublication, you can fill in the publication details later or enter a draft title or the journal name you plan to submit to. It’s still a good idea to include the category, so you don’t forget. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
Study description
TitleImaging mitotic cells
DescriptionVisualising HeLa cells using confocal microscopy
Publication detailsTBC
+{: .comment} + + +### General dataset information + +This should include all the information that relates to all the data in the project. This can include the names of contributors and the repository where the data is or will be stored. State the licence under which you intend to make the data available, the repository you intend to submit to and if you are using a schema for structuring your metadata. This helps to keep all collaborators on the same page. Any other general information with respect to the study can be included here, but try to keep this broad as more detailed information should be included in other sections of the metadata. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
General Dataset Information
ContributorsAlica and Bob
RepositoryBioimage Archive
LicensesCC-BY
SchemasDatacite Metadata
+{: .comment} + +## Study component + +A study component can be thought of as an experiment, both the physical experiment and subsequent data analysis, or a series of experiments that have been conducted with the same aim in mind. + +The associated metadata should describe the imaging method used and include a description of the image dataset. The REMBI guidelines store high-level metadata in the study component and then divide the more detailed metadata into other modules. + +Within the Study component we include the Imaging Method which should describe the techniques used to acquire the raw data. This could be one or multiple methods, which should be part of a relevant ontology. For Confocal Microscopy data, we can use the Biological Imaging Methods Ontology, although it is also present in a number of other ontologies. + +The description of the study component should include an overview of what was imaged as well as any processed data that is created during analysis. + +> Example +> +> +> +> +> +> +> +> +> +>
Imaging MethodConfocal Microscopy
Study Component DescriptionImages of cells and segmented binary masks
+{: .comment} + + +> Storing metadata +> +> You could either choose to store the metadata in the same file as your study data or have a new file for each study component. This could be stored in the same place as your study metadata, or you could create a subdirectory structure. +{: .tip} + +## Biosample + +The first thing you need for the biosample metadata is an Identity. This is a code that you assign to each sample you are describing, which will link this metadata to the physical sample. Then, state what the biological entity is, which should come from a relevant ontology. Use a taxonomy to name the organism. Next, describe the variables in your experiment. The REMBI guidelines split the variables into three types: +- intrinsic - describe an innate trait of the biosample, such as a genetic alteration +- extrinsic - describe something you added to the sample, for example, a reagent +- experimental - things that you intentionally vary, like time + +You can leave out some of the variables if they are not part of your experiment. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
IdentityCM001
Biological entityJURKAT E-6.1 cell
OrganismHomo sapiens
Intrinsic variableJurkat E6.1 transfected with emerald-VAMP7
Extrinsic variableAspirin
Experimental variablesDose response of aspirin
+{: .comment} + +## Specimen + + The specimen metadata should include: +- the experimental status (control or test) +- the location within the biosample, such as a coordinate or a particular well in a plate +- how the sample was prepared +- how the signal is being generated +- the content and biological entities of different channels. + + Include enough information so that someone with experience in the field could reproduce a sample by following the information you provided. Assume they would know typical techniques and name them using terms from an ontology if possible. Only include lots of detail if you are describing a novel technique. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
Experimental statusControl
Location within biosamplePlasma membrane within 100 nm of coverslip (TIRF)
Preparation methodCos-7 cells cultured in DMEM medium, and then plated on #1 coverslips and imaged live in L-15 medium
Signal/contrast mechanismfluorescent proteins
Channel – contentGreen: eGFP, Red: mCherry
Channel – biological entityGreen: EGFR, Red: Src
+{: .comment} + +## Image acquisition +Here you should include all the information about the instrument you used and how it was set up. Like with the specimen metadata, describe this information as though you are speaking to someone who already knows how to use a similar instrument. What would they need to know to produce the same image data? + +Check with your facility manager if they have any guidelines for what details need to be recorded for your particular instrument. Make sure that the parameters you record can actually be used by someone else if they don’t have exactly the same instrument or setup. For example, don’t say that you used a certain percentage of laser power, as this doesn’t tell you how much power was used unless you also provide the total power of the laser. If the instrument software has automatically generated a metadata file, remember to save this. Depending on its content, this may be sufficient. + +Start with the details of the equipment for the Instrument Attributes. If this is commercial equipment, include the make and model, a short description of what type of instrument it is and details about its configuration. If the instrument is bespoke, you will need to include more details. Next, you should include image acquisition parameters. These relate to how the instrument was set up for the particular experiment. Some of these may be captured automatically by the instrument’s software, so make things easy for yourself and check if a file is generated and what’s in it. If a file is generated, then you only need to manually record anything that is missing from the file. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
Instrument attributesOlympus FV3000, laser point scanning confocal, 500-550 nm filter, 37-degree chamber.
Image acquisition parameters
ObjectiveCos-7 cells cultured in DMEM medium, and then plated on #1 coverslips and imaged live in L-15 medium
Excitation Wavelength488 nm
PMT gain500 V
Pixel dwell time2 𝜇s
Confocal aperture200 𝜇m
+{: .comment} + +> Helpful resources +> +> To help you collect the information for your own data, you might have a look at the local resources from your institution or universities. For example, at Warwick University, there are [webpages](https://warwick.ac.uk/fac/sci/med/research/biomedical/facilities/camdu/methodsreporting/) describing the metadata that needs to be collected for some of the microscopes. +> +{: .tip} + +## Image data + +In this section, you record all the information related to all the images you have. Not only the primary or raw images, but also any processed images, perhaps such as binary files showing the resulting segmentation. + +You need to say what format the images are in and if they have undergone any compression, the dimensions of the images, and what the physical size of the pixel or voxel is, including the units. Most of this information you should be able to get from the metadata or header of the image files. + +Next, you need to state the physical size of the image or magnification, calculated from the pixel or voxel size and the dimension extents. Give any information related to how the channels are represented. For processed images, you need to provide the methods used for processing. + +Finally, say you have used contrast inversion, do the bright features in the image correspond to areas of high signal, or is it the other way around? + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
TypePrimary Image, Segmentation
Format and compressionPrimary: .oir (Olympus), Segmentation: .tiff
Dimension extentsx: 512, y: 512, z: 25
Size description153.6 x 153.6 x 25 𝜇m
Pixel/Voxel size description0.3 x 0.3 x 0.1 𝜇m
Image processing methodFiji: Median filter (3 pixel kernel), Otsu threshold
Contrast inversionNo
+{: .comment} + + +## Image correlation + +If you have used different imaging modalities with the same sample, this part of the metadata should describe how the images relate to one another. You could use this section to describe generally the relationship between images. In the example below, images from different modalities have been aligned. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +>
Spatial and temporal alignmentManual
Fiducials usedSoil grains
Transformation matrixSee file: Transforms.csv
Size description153.6 x 153.6 x 25 𝜇m
Related images and relationshipPrimary XCT: Data/XCT +> Primary XRF: Data/XRF +> Processed XRF: Data/Transformed_XRF
+{: .comment} + +## Analysed data + +This section should not include metadata for any image data, including processed images, as that should have been covered in the Image Data section. Instead, it should describe the analysis results you have, such as measurements. Have you done some numerical analysis or some phenotyping or something else? There is no need to describe the methods in great detail if they are already described in the relevant publication. + +> Example +> +> +> +> +> +> +> +> +> +> +> +> +> +>
Analysis results typeSpeed of cell division
Data used for analysisPreprocessed images, Cell tracks
Analysis methodTrack cell lineage: BayesianTracker (btrack) with configuration track_config.json +> Measure speed: Numerical analysis in Python
+{: .comment} + +# Final notes + +For more examples, check out REMBI Supplementary Information - either in [pdf](https://static-content.springer.com/esm/art%3A10.1038%2Fs41592-021-01166-8/MediaObjects/41592_2021_1166_MOESM1_ESM.pdf) or [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919). + + +At first glance, it might seem to be quite a stretch to collect all that metadata! But don’t get discouraged - following those guidelines will ensure better communication between the scientists and will make your research FAIR: Findable, Accessible, Interoperable, Reusable. During big data era when we are surrounded by so much resources, it’s crucial to get good data management habits, share them with others and hence contribute to the development of Science toghether. diff --git a/topics/fair/tutorials/bioimage-metadata/tutorial.md b/topics/fair/tutorials/bioimage-metadata/tutorial.md new file mode 100644 index 00000000000000..dc8f7823bb6dcd --- /dev/null +++ b/topics/fair/tutorials/bioimage-metadata/tutorial.md @@ -0,0 +1,157 @@ +--- +layout: tutorial_hands_on +title: FAIR Bioimage Metadata + +zenodo_link: '' + +questions: +- What are the commonly used repositories for bioimaging data? +- Which repositories are suitable for my data? +- What are the requirements for submitting? + +objectives: +- Locate bioimage data repositories +- Compare repositories to find which are suitable for your data +- Find out what the requirements are for submitting + +time_estimation: "15m" + +key_points: +- Data repositories such as BioImage Archive, Electron Microscopy Public Image Archive (EMPIAR) and Image Data Repository (IDR) are available to help make bioimaging data FAIR. +- Find out what are the repository's requirements to help decide which is suitable for your data. +- All repositories require some metadata, so collecting the metadata alongside data acquisition will make this process easier. + +tags: +- fair +- data management +- bioimaging + +priority: 4 + +contributions: + authorship: + - wee-snufkin + - Laura190 + - kkamieniecka + - poterlowicz-lab + +subtopic: fair-data + +requirements: + - type: "internal" + topic_name: fair + tutorials: + - fair-intro + - data-management + + +follow_up_training: + - + type: "internal" + topic_name: fair + tutorials: + - bioimage-REMBI + - + type: "internal" + topic_name: imaging + +--- + +# FAIR Bioimaging + +Submitting your data to a repository is a good way to make the data FAIR. This will make it: +- **F**indable, as the data will be given specific identifiers +- **A**ccessible, as the data will be available online, open and free where possible +- **I**nteroperable, as the repository will often enforce the use of formalised, consistent language +- **R**eusable, as the data will be released under a license with detailed provenance + +# Examples of bioimage data repositories + +But the question remains: where can I submit my data? Currently the main repositories where you can submit your images are: BioImage Archive, Electron Microscopy Public Image Archive (EMPIAR) and Image Data Repository (IDR). Let's have a look at the questions below to explore those repositories more in depth! + +> +> +> Listed below are three examples of Bioimage Data Repositories: +> - [IDR: Image Data Repository](https://idr.openmicroscopy.org/) +> - [EMPIAR: Electron Microscopy Public Image Archive](https://www.ebi.ac.uk/empiar/) +> - [BioImage Archive](https://www.ebi.ac.uk/bioimage-archive/) +> +> Visit their websites and find out what their scope is or what sorts of datasets they accept. +> +> > +> > +> > - **IDR: Image Data Repository**: Curated datasets of cell and tissue microscopy images +> > - **EMPIAR: Electron Microscopy Public Image Archive**: Cryo-EM, Scanning Electron Microscopy, Soft X-ray tomography +> > - **BioImage Archive**: Everything else and some overlap with IDR and EMPIAR +> {: .solution} +> +{: .question} + +> Repositories everywhere +> +> As well as these repositories, your Institute may have their own repository. For example, at the Warwick University, there is also [OMERO](https://warwick.ac.uk/fac/sci/med/research/biomedical/facilities/camdu/training/omero-warwick-guide_2.pdf) and [WRAP](https://wrap.warwick.ac.uk/). +> +{: .tip} + + +> +> The repositories we are looking at in this course are for bioimage data, not medical data. There are other specialist repositories available if you have medical data. +{: .comment} + +# Things to consider when choosing a repository + +Now we know what repositories are available, but how to decide which one is best given the files we want submit? Try to work through the below Question box and find the answer! + +> +> +> Choose one repository from above and look through its documentation. Try to find: +> 1. What data formats are accepted? +> 2. What license is recommended to publish the data? +> 3. Are there specific instructions for large datasets? +> +> > +> > +> > - **IDR: Image Data Repository**: +> > 1. The IDR uses the Bio-Formats library for reading imaging data. Bio-Formats supports over 150 proprietary and open file formats (see the [full list](https://bio-formats.readthedocs.io/en/stable/supported-formats.html)). +> > 2. It is strongly recommended that submitters make their datasets available under [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. +> > 3. As specified on the [IDR website](https://idr.openmicroscopy.org/about/submission.html), dataset size is typically not an issue, but for sizes significantly larger than 1000 GB special planning may be needed. +> > - **EMPIAR: Electron Microscopy Public Image Archive**: +> > 1. Provide image data in the formats in which they are uploaded, but recommended is the use of common formats in the field including MRC, MRCS, TIFF, DM4, IMAGIC, SPIDER, MRC FEI, RAW FEI and BIG DATA VIEWER HDF5. +> > 2. All data in EMPIAR is freely and publicly available to the global community under the [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/) license. +> > 3. As specified on the [EMPIAR page](https://www.ebi.ac.uk/empiar/deposition/manual/#manIntro), typically having more than 4000 files in a directory has a tendency to slow down access considerably. It is recommended in this case to sub-divide the directory into subdirectories with no more than 4000 files each. If you have a single file larger than 1 TB, contact EMPAIR in advance. +> > To find out more, check the [FAQ page](https://www.ebi.ac.uk/empiar/faq). +> > - **BioImage Archive**: +> > 1. The BioImage Archive accepts all image data formats, although formats readable by [Bio-Formats library](https://bio-formats.readthedocs.io/en/stable/supported-formats.html), are preferable. +> > 2. According to [BioImage Archive Policies](https://www.ebi.ac.uk/bioimage-archive/help-policies/), all new data directly submitted to the BioImage Archive will be made available under a [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/) licence, datasets brokered/imported from other resources may have other licenses though. +> > 3. There are different submission methods depending on data size: +> > - Less than 50 GB total size, less than 20GB per file – use submission tool +> > - Up to 1TB total size – use FTP +> > - Anything larger – use Aspera +> > +> > To find out more, check the [FAQ page](https://www.ebi.ac.uk/bioimage-archive/help-faq/). +> {: .solution} +> +{: .question} + + +# What metadata to collect + +Whichever repository you choose, you will be required to upload some metadata along with your data. In an ideal world, you would remember everything about your data when you submit it. In reality, this is unlikely, the data could have been collected over a long time period or by different people. To overcome these challenges, it is best to collect metadata alongside imaging experiments, don’t leave it all to the end! +However, this raises further challenges. At the time of data acquisition, you probably won’t know which repository you will submit it to, what the study results will be, or even who will be the target audience for the data. So what metadata do you need to collect? + +Currently, there is no standard for bioimages, so here is the general outline how to proceed: +- If you have chosen a repository, use their template/guidelines. +- Otherwise, use [REMBI](https://www.nature.com/articles/s41592-021-01166-8). These are published guidelines which are explained in detail in the [REMBI tutorial]({% link topics/fair/tutorials/bioimage-REMBI/tutorial.md %}). REMBI is useful as it should cover most of the metadata requirements of the repositories, even if you haven’t decided which one you want to use yet. +- For medical images, see the [DICOM](https://www.dicomstandard.org/) standard. + +# How to store metadata + +Metadata should be stored somewhere that it can be viewed and edited by collaborators. This helps everyone to stay on the same page with regard to what data should and has been collected. This is also useful if different people contribute to different aspects of the image acquisition, e.g. maybe one person prepared the sample, another imaged it under the microscope, and a third person did the post-processing. Each person can then update the metadata related to their part of the study. + +Recommendations for storing metadata: +- If a repository has been chosen, use their template (if provided). +- Use a Delimited text file format, e.g. .csv, .tsv. You can use spreadsheet software and save to this format. Try to use a plain format, e.g. avoid merged/split cells. +- Use data management software suitable for your data, e.g. [OMERO](https://www.openmicroscopy.org/omero/). + +# Further steps +{% icon congratulations %} Congratulations on successfully completing this tutorial! If you want to know more about FAIR data management, we provide training that you can find in the [FAIR Data, Workflows, and Research]({% link topics/fair %}) section on GTN. For more detials on FAIR data management in bioimaging, see the [REMBI tutorial]({% link topics/fair/tutorials/bioimage-REMBI/tutorial.md %}), and if you want to dive into the imaging analysis straight away, feel free to choose one of the Galaxy [tutorials]({% link topics/imaging %})! Additionally, [Global BioImaging](https://globalbioimaging.org/) offers [training courses](https://globalbioimaging.org/international-training-courses/repository/image-data) dedicated to image data management, sharing, reuse, and image data repositories that were mentioned in this tutorial. diff --git a/topics/imaging/metadata.yaml b/topics/imaging/metadata.yaml index db155013ec2b17..d256ef7cb6bc69 100644 --- a/topics/imaging/metadata.yaml +++ b/topics/imaging/metadata.yaml @@ -9,6 +9,12 @@ requirements: - type: "internal" topic_name: introduction + - + type: "internal" + topic_name: fair + tutorials: + - bioimage-metadata + - bioimage-REMBI editorial_board: - thomaswollmann references: