diff --git a/pages/governance/data-access.mdx b/pages/governance/data-access.mdx index 404d1c8..bd0bcb7 100644 --- a/pages/governance/data-access.mdx +++ b/pages/governance/data-access.mdx @@ -4,12 +4,12 @@ import Image from 'next/image' # Data Access -Data access for Gray Foundation corresponds to the general [data access types described for the Synapse platform](https://help.synapse.org/docs/Data-Access-Types.2014904611.html). +The data access for Gray Foundation aligns with the general data access types described for the Synapse platform. You can find more details about these access types [here](https://help.synapse.org/docs/Data-Access-Types.2014904611.html). -## Data Access Tags in the Portal +## Data Access Tags in the Portal - This is a future visual labeling scheme to help represent access in a more user-friendly manner in the portal. + This is an upcoming visual labeling system designed to represent access in a more user-friendly manner within the portal. @@ -17,48 +17,41 @@ Data access for Gray Foundation corresponds to the general [data access types de ##### "PRIVATE" private data access -Phase 1 datasets will likely not appear in the portal at all. -If they do, the pink "PRIVATE" tag will indicate that this data exists on Synapse, but is likely still under compilation/analysis, and there is no interest in sharing it at the moment. Do not contact the team regarding the data. +Phase 1 datasets will most likely not be visible in the portal. If they do appear, the pink "PRIVATE" tag will indicate that the data exists on Synapse but is still undergoing compilation/analysis. At the moment, there is no intention to share this data, so please refrain from contacting the team regarding its availability. ### Phase 2 Datasets ##### "COLLABORATIVE ONLY" collaborative data access -Phase 2 datasets MAY appear in the portal, but will be tagged with "COLLABORATIVE ACCESS ONLY". -Researchers interested in collaboration **will have to request access through contacting the Team PI**. -- Data is private and downloable only through the individuals or teams explicitly granted with those permissions. -- Admins (PIs and Data Leads) for the team's repository can grant view or download access by adding individuals or teams to the folder or project. This allows data to be shared for collaboration without moving the data to Phase 3. -- In conversational context, this data might also be referred to as "under embargo". +Phase 2 datasets may be visible in the portal but will be labeled as "COLLABORATIVE ACCESS ONLY". Researchers interested in collaboration will need to request access by contacting the Team PI. +- Data is private and can only be downloaded by individuals or teams explicitly granted permission. +- Admins (PIs and Data Leads) for the team's repository can grant view or download access by adding individuals or teams to the folder or project. This allows data to be shared for collaboration without moving it to Phase 3. +- In conversation, this data may also be referred to as being "under embargo". -### Phase 3 Datasets +### Phase 3 Datasets -Phase 3 datasets are often data released with or after publication. However, there might still be conditions placed on access. -Within Phase 3, data has several access classes: +Phase 3 datasets often consist of data released concurrently with or after publication, although certain access conditions may still apply. Within Phase 3, data is classified into several access categories: ##### "CONTROLLED ACCESS" controlled data access -- Registered Synapse users can download this data as long as they fulfill additional actions such as accepting Conditions of Use, submitting a statement, or getting an approval from governance. -- Since these controls are implemented through a different data access layer, for _certain_ controls, once they are applied, even individuals on the Access Control List with download permissions will need to complete the required action to download data. -- This is usually applicable for raw sequencing data. +- Registered Synapse users can download this data after fulfilling additional requirements such as accepting Conditions of Use, submitting a statement, or obtaining approval from governance. +- For certain controls, even individuals on the Access Control List with download permissions will need to complete the required action to download data, as these controls are implemented through a different data access layer. +- This typically applies to raw sequencing data. ##### "PUBLIC" public data access -- Anyone on the web can view the data, but only registered Synapse users can download this data. Data can be downloaded without any restrictions or additional steps required. -- This is usually applicable for processed data such as summarized gene expression and cell and tissue images. +- The data can be viewed by anyone on the web, but only registered Synapse users can download it. There are no restrictions or additional steps required for downloading. +- This usually applies to processed data such as summarized gene expression and cell and tissue images. ##### "OPEN" open data access -- Anyone on the web can view and download this data, no account needed. -- This is usually applicable for metadata. +- Anyone on the web can view and download this data without requiring an account. +- This usually applies to metadata. -For more information, reach out to our governance analysis team at gray-foundation-service@sagebase.org. +For more information, please refer to the policies on our [Data Access](/help/data-access) page or contact our governance analysis team at gray-foundation-service@sagebase.org. ### See Also -For specific information regarding synapse governance - see https://help.synapse.org/docs/Synapse-Governance.2004255211.html - - - - +For specific information regarding Synapse governance, please visit https://help.synapse.org/docs/Synapse-Governance.2004255211.html. \ No newline at end of file diff --git a/pages/onboarding/_meta.json b/pages/onboarding/_meta.json index 1066f26..5ae76b2 100644 --- a/pages/onboarding/_meta.json +++ b/pages/onboarding/_meta.json @@ -3,6 +3,6 @@ "account-setup": "Account Setup", "dsp-template": "Data Sharing Plan", "data-processing-organization": "Data Processing & Organization", - "data-sharing": "Data Sharing Realized", - "data-analysis": "Data Analysis" + "data-sharing": "Data Sharing Implementation", + "data-analysis": "Data Analysis Procedures" } \ No newline at end of file diff --git a/pages/onboarding/account-setup.mdx b/pages/onboarding/account-setup.mdx index 87cce50..736cf46 100644 --- a/pages/onboarding/account-setup.mdx +++ b/pages/onboarding/account-setup.mdx @@ -1,6 +1,6 @@ ## Account Setup -1. Please review the [account types here](https://help.synapse.org/docs/Synapse-User-Account-Types.2007072795.html). -2. In most cases, you will need to be a **Certified User**, so make sure you complete that step. -3. Once you have your account, let the DCC know so a team-specific invite can be sent. -4. You must accept the invitation for a Team invite. This requires following an emailed link and accepting the invitation on a Synapse team page. +1. Please familiarize yourself with the different [account types available](https://help.synapse.org/docs/Synapse-User-Account-Types.2007072795.html). +2. In most cases, it is necessary to become a **Certified User** before proceeding. Make sure you complete this step. +3. After creating your account, please inform the DCC so that a team-specific invitation can be sent to you. +4. To join a team, you must accept the invitation by clicking on the link provided in the email and following the instructions on the Synapse team page. \ No newline at end of file diff --git a/pages/onboarding/data-analysis.mdx b/pages/onboarding/data-analysis.mdx index 4dae709..5d9f72a 100644 --- a/pages/onboarding/data-analysis.mdx +++ b/pages/onboarding/data-analysis.mdx @@ -11,20 +11,18 @@ Below, we provide more details on analysis apps and which data types are targete ## cBioPortal -The [cBioPortal](https://docs.cbioportal.org/) for Cancer Genomics was originally developed at Memorial Sloan Kettering Cancer Center (MSK) for sharing of **genomics** data. -We provide a custom instance for the Gray Foundation. The data which should be made available include: +The [cBioPortal](https://docs.cbioportal.org/) for Cancer Genomics was originally developed at Memorial Sloan Kettering Cancer Center (MSK) for sharing genomic data. We provide a custom instance for the Gray Foundation. The available data includes: - Copy Number Variation (CNV) - Summarized gene expression (RNA-seq) - Mutations -## CELLxGENE +## CELLxGENE -[CELLxGENE](https://github.com/chanzuckerberg/cellxgene) ("cell-by-gene") is an interactive data explorer for **single-cell datasets**. +[CELLxGENE](https://github.com/chanzuckerberg/cellxgene) ("cell-by-gene") is an interactive data explorer specifically designed for analyzing single-cell datasets. - Single-cell RNA-sequencing data ## Minerva Viewer -- CyCIF imaging data - +- CyCIF imaging data \ No newline at end of file diff --git a/pages/onboarding/data-processing-organization.mdx b/pages/onboarding/data-processing-organization.mdx index 9b9238b..918eee4 100644 --- a/pages/onboarding/data-processing-organization.mdx +++ b/pages/onboarding/data-processing-organization.mdx @@ -1,26 +1,14 @@ -## Data processing overview +## Overview of Data Processing -### Data processing by the project team +### Data Processing by the Project Team -Depending on their funded aims, teams might have special data workflows, such as: -- Generating sequencing data and deriving data with multiple variant calling pipelines -- Producing high-resolution images and creating summary features from images -- Combining different types of data +Depending on their funded aims, project teams may have specialized data workflows, which can include: -The anticipated workflow should helpfully be discussed as part of the onboarding, especially if it is complicated or outside of what is typical. -Teams should provide information or other documentation of their workflow. +- Generating sequencing data and deriving data using multiple variant calling pipelines. +- Producing high-resolution images and extracting summary features from images. +- Combining different types of data. -By having information on the data-generating process, the DCC can better work with each team to answer the questions: -- What are the different forms of data that will be generated -- how to optimally intake and manage the data artifacts from this workflow? -- Are there recommendations that can be suggested for this workflow to avoid potential problems downstream? -- What other resources can the DCC provide if possible? - -### Data processing by the DCC - -Data uploaded to Synapse may also be processed by the DCC for: -- Quality control -- File format conversions -- Other data transformations to allow data to be loaded and shared in cBioPortal/other analysis application +During the onboarding process, it is essential to discuss the anticipated workflow, especially if it is complex or deviates from the standard. Project teams should provide information or documentation regarding their workflow. ## Data organization corresponds to processing stages @@ -54,3 +42,16 @@ Subfolders must be of the same data type and level as the root folder they are c ``` +By understanding the data generation process, the Data Coordination Center (DCC) can effectively collaborate with each team to address the following questions: + +- What are the different types of data that will be generated, and how can the data artifacts from this workflow be optimally handled and managed? +- Are there any recommendations that can be provided to ensure a smooth workflow and avoid potential issues in the downstream analysis? +- What additional resources can the DCC offer, if available? + +### Data Processing by the DCC + +In addition to the project team's data processing, the DCC also performs data processing on the uploaded data in Synapse. This processing includes: + +- Quality control assessments. +- File format conversions. +- Other necessary data transformations to facilitate data loading and sharing in cBioPortal or other analysis applications. diff --git a/pages/onboarding/data-sharing.mdx b/pages/onboarding/data-sharing.mdx index bdff335..9cf7464 100644 --- a/pages/onboarding/data-sharing.mdx +++ b/pages/onboarding/data-sharing.mdx @@ -1,13 +1,13 @@ ## Introduction -The figure below provides an overview of contributing data. +The figure below provides an overview of contributing data. - @@ -16,17 +16,17 @@ The figure below provides an overview of contributing data. ### Using the UI or programmatic clients -Data should be uploaded to the correct folders; the folder structure is described in the previous section. -Data can be uploaded via the **User Interface (UI)** -- see [docs here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileviatheSynapseUI). -However, for larger and more numerous files, we recommend doing upload **programmatically** (Command line, Python, R) -- see [docs here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileProgrammatically). +Data can be uploaded using the User Interface (UI) of Synapse. You can find detailed instructions in the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileviatheSynapseUI). + +However, for larger and numerous files, we recommend utilizing programmatic methods such as the command line, Python, or R. This allows for more efficient and streamlined uploading. You can refer to the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileProgrammatically) for instructions on uploading files programmatically. ### Help and alternative methods -In very special circumstances, the DCC can discuss: +Under unique circumstances, the DCC can explore the following options: -- Mailing a hard drive -- Globus transfers -- Help with programmatic methods +Sending data via physical hard drive +Utilizing Globus transfers +Providing assistance with programmatic methods ## Annotating data