Skip to content

Commit

Permalink
Merge pull request #44 from gf-dcc/patch-governance
Browse files Browse the repository at this point in the history
Patch governance
  • Loading branch information
anngvu authored Jul 4, 2023
2 parents 3cd0ccd + baef708 commit 4838d4f
Show file tree
Hide file tree
Showing 7 changed files with 88 additions and 87 deletions.
45 changes: 19 additions & 26 deletions pages/governance/data-access.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,61 +4,54 @@ import Image from 'next/image'

# Data Access

Data access for Gray Foundation corresponds to the general [data access types described for the Synapse platform](https://help.synapse.org/docs/Data-Access-Types.2014904611.html).
The data access for Gray Foundation aligns with the general data access types described for the Synapse platform. You can find more details about these access types [here](https://help.synapse.org/docs/Data-Access-Types.2014904611.html).

## Data Access Tags in the Portal
## Data Access Tags in the Portal

<Callout type="info" emoji="ℹ️">
This is a future visual labeling scheme to help represent access in a more user-friendly manner in the portal.
This is an upcoming visual labeling system designed to represent access in a more user-friendly manner within the portal.
</Callout>

<Steps>
### Phase 1 Datasets

##### "PRIVATE"
<Image src="/badge_phase1_PRIVATE.svg" alt="private data access" width={160} height={30} />
Phase 1 datasets will likely not appear in the portal at all.
If they do, the pink "PRIVATE" tag will indicate that this data exists on Synapse, but is likely still under compilation/analysis, and there is no interest in sharing it at the moment. Do not contact the team regarding the data.
Phase 1 datasets will most likely not be visible in the portal. If they do appear, the pink "PRIVATE" tag will indicate that the data exists on Synapse but is still undergoing compilation/analysis. At the moment, there is no intention to share this data, so please refrain from contacting the team regarding its availability.

### Phase 2 Datasets

##### "COLLABORATIVE ONLY"
<Image src="/badge_phase2_COLLAB.svg" alt="collaborative data access" width={270} height={30} />
Phase 2 datasets MAY appear in the portal, but will be tagged with "COLLABORATIVE ACCESS ONLY".
Researchers interested in collaboration **will have to request access through contacting the Team PI**.
- Data is private and downloable only through the individuals or teams explicitly granted with those permissions.
- Admins (PIs and Data Leads) for the team's repository can grant view or download access by adding individuals or teams to the folder or project. This allows data to be shared for collaboration without moving the data to Phase 3.
- In conversational context, this data might also be referred to as "under embargo".
Phase 2 datasets may be visible in the portal but will be labeled as "COLLABORATIVE ACCESS ONLY". Researchers interested in collaboration will need to request access by contacting the Team PI.
- Data is private and can only be downloaded by individuals or teams explicitly granted permission.
- Admins (PIs and Data Leads) for the team's repository can grant view or download access by adding individuals or teams to the folder or project. This allows data to be shared for collaboration without moving it to Phase 3.
- In conversation, this data may also be referred to as being "under embargo".

### Phase 3 Datasets
### Phase 3 Datasets

Phase 3 datasets are often data released with or after publication. However, there might still be conditions placed on access.
Within Phase 3, data has several access classes:
Phase 3 datasets often consist of data released concurrently with or after publication, although certain access conditions may still apply. Within Phase 3, data is classified into several access categories:

##### "CONTROLLED ACCESS"
<Image src="/badge_phase3_CONTROLLED.svg" alt="controlled data access" width={200} height={30} />
- Registered Synapse users can download this data as long as they fulfill additional actions such as accepting Conditions of Use, submitting a statement, or getting an approval from governance.
- Since these controls are implemented through a different data access layer, for _certain_ controls, once they are applied, even individuals on the Access Control List with download permissions will need to complete the required action to download data.
- This is usually applicable for raw sequencing data.
- Registered Synapse users can download this data after fulfilling additional requirements such as accepting Conditions of Use, submitting a statement, or obtaining approval from governance.
- For certain controls, even individuals on the Access Control List with download permissions will need to complete the required action to download data, as these controls are implemented through a different data access layer.
- This typically applies to raw sequencing data.

##### "PUBLIC"
<Image src="/badge_phase3_PUBLIC.svg" alt="public data access" width={140} height={30} />
- Anyone on the web can view the data, but only registered Synapse users can download this data. Data can be downloaded without any restrictions or additional steps required.
- This is usually applicable for processed data such as summarized gene expression and cell and tissue images.
- The data can be viewed by anyone on the web, but only registered Synapse users can download it. There are no restrictions or additional steps required for downloading.
- This usually applies to processed data such as summarized gene expression and cell and tissue images.

##### "OPEN"
<Image src="/badge_phase3_OPEN.svg" alt="open data access" width={130} height={30} />
- Anyone on the web can view and download this data, no account needed.
- This is usually applicable for metadata.
- Anyone on the web can view and download this data without requiring an account.
- This usually applies to metadata.

</Steps>

For more information, reach out to our governance analysis team at [email protected].
For more information, please refer to the policies on our [Data Access](/help/data-access) page or contact our governance analysis team at [email protected].

### See Also

For specific information regarding synapse governance - see https://help.synapse.org/docs/Synapse-Governance.2004255211.html




For specific information regarding Synapse governance, please visit https://help.synapse.org/docs/Synapse-Governance.2004255211.html.
47 changes: 28 additions & 19 deletions pages/governance/overview.mdx
Original file line number Diff line number Diff line change
@@ -1,46 +1,55 @@
import { Steps } from 'nextra-theme-docs'

# What is Governance?
# Understanding Governance

Data governance is the management of data access and availability.
Data governance involves managing data access and availability effectively.

Data deposited in the Synapse platform have access control features implemented by the platform.
Data sharing settings are negotiated by the sharing agreements between Sage Bionetworks and research teams.
Data is not made public until the project teams agree to release the data.
When it comes to data deposited in the Synapse platform, access control features are implemented to ensure secure management.
Data sharing settings are determined through sharing agreements established between Sage Bionetworks and research teams.
Data remains private until the project teams agree to release it.

## Data Sharing Phases for Gray Foundation
## Phases of Data Sharing for Gray Foundation

Data will be shared in three phases:
Data sharing occurs in three distinct phases:

<Steps>

### Private Phase

Data is accessible only the by Contributor and Contributor-designated individuals. This is the default configuration unless decided otherwise by the grant (e.g. certain pilot projects start in the Collaborative Sharing Phase).
During this phase, data is exclusively accessible to the Contributor and individuals designated by the Contributor. This configuration serves as the default setting unless specified otherwise by the grant. For instance, certain pilot projects may start directly in the Collaborative Sharing Phase.

### Collaborative Sharing Phase

Curated Data is accessible by the “in network” investigators.
In the Collaborative Sharing Phase, curated data becomes accessible to investigators within the network.

### Public Sharing Phase

Data is accessible by the greater research community (with appropriate controls).
In the Public Sharing Phase, data is made available to the broader research community while maintaining appropriate controls.

</Steps>

## For Contributing Teams -- What the Process Looks Like
## Process for Contributing Teams

Much governance activity takes place before data is even generated or deposited into Synapse.
Via the initial Governance Working Group, negotiations with each institution, and more granular [data sharing plans](/onboarding/dsp-template) requested from data contributors, the DCC collects the agreements and information needed for tailoring governance and that answer some of the below questions:
Significant governance activities occur even before data is generated or deposited into Synapse.
Through the initial Governance Working Group, negotiations with each institution, and detailed data sharing plans requested from data contributors (available at [/onboarding/dsp-template](/onboarding/dsp-template)), the Data Coordination Center (DCC) collects the necessary agreements and information to tailor governance measures. These efforts address several key questions, such as:

- What are the applicable policies given the data's territorial origins? (e.g. data from Europe fall under GDPR jurisdiction)
- What are the applicable policies given the data's institutional origins?
- What are the conditions given the data contributors' preferences?
- What are the relevant policies based on the territorial origins of the data? (For example, data from Europe falls under the jurisdiction of GDPR.)
- What are the relevant policies based on the institutional origins of the data?
- What conditions are specified according to the preferences of data contributors?
- Does the data contain sensitive information?
- What were the patient consents for this data, for human data?
- Is the data provided de-identified according to HIPAA, for human data?
- What were the patient consents obtained for this human data?
- Is the data provided in a de-identified format according to HIPAA guidelines for human data?


### Governance lifecycle chart
<iframe allowfullscreen
frameborder="0"
width="100%"
height="500px"
src="https://lucid.app/documents/embedded/b3092978-9ae9-46cc-812b-5c26c5fb4bf7" id="_-N2BoCYE4j1">
</iframe>


## For Data Re-users

For researchers hoping to understand governance so that they can re-use the data, please see the next [Data Access section](data-access).
For researchers hoping to understand governance so that they can re-use the data, please see the next [Data Access section](data-access).
4 changes: 2 additions & 2 deletions pages/onboarding/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
"account-setup": "Account Setup",
"dsp-template": "Data Sharing Plan",
"data-processing-organization": "Data Processing & Organization",
"data-sharing": "Data Sharing Realized",
"data-analysis": "Data Analysis"
"data-sharing": "Data Sharing Implementation",
"data-analysis": "Data Analysis Procedures"
}
8 changes: 4 additions & 4 deletions pages/onboarding/account-setup.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Account Setup

1. Please review the [account types here](https://help.synapse.org/docs/Synapse-User-Account-Types.2007072795.html).
2. In most cases, you will need to be a **Certified User**, so make sure you complete that step.
3. Once you have your account, let the DCC know so a team-specific invite can be sent.
4. You must accept the invitation for a Team invite. This requires following an emailed link and accepting the invitation on a Synapse team page.
1. Please familiarize yourself with the different [account types available](https://help.synapse.org/docs/Synapse-User-Account-Types.2007072795.html).
2. In most cases, it is necessary to become a **Certified User** before proceeding. Make sure you complete this step.
3. After creating your account, please inform the DCC so that a team-specific invitation can be sent to you.
4. To join a team, you must accept the invitation by clicking on the link provided in the email and following the instructions on the Synapse team page.
10 changes: 4 additions & 6 deletions pages/onboarding/data-analysis.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,18 @@ Below, we provide more details on analysis apps and which data types are targete

## cBioPortal

The [cBioPortal](https://docs.cbioportal.org/) for Cancer Genomics was originally developed at Memorial Sloan Kettering Cancer Center (MSK) for sharing of **genomics** data.
We provide a custom instance for the Gray Foundation. The data which should be made available include:
The [cBioPortal](https://docs.cbioportal.org/) for Cancer Genomics was originally developed at Memorial Sloan Kettering Cancer Center (MSK) for sharing genomic data. We provide a custom instance for the Gray Foundation. The available data includes:

- Copy Number Variation (CNV)
- Summarized gene expression (RNA-seq)
- Mutations

## CELLxGENE
## CELLxGENE

[CELLxGENE](https://github.com/chanzuckerberg/cellxgene) ("cell-by-gene") is an interactive data explorer for **single-cell datasets**.
[CELLxGENE](https://github.com/chanzuckerberg/cellxgene) ("cell-by-gene") is an interactive data explorer specifically designed for analyzing single-cell datasets.

- Single-cell RNA-sequencing data

## Minerva Viewer

- CyCIF imaging data

- CyCIF imaging data
39 changes: 20 additions & 19 deletions pages/onboarding/data-processing-organization.mdx
Original file line number Diff line number Diff line change
@@ -1,26 +1,14 @@
## Data processing overview
## Overview of Data Processing

### Data processing by the project team
### Data Processing by the Project Team

Depending on their funded aims, teams might have special data workflows, such as:
- Generating sequencing data and deriving data with multiple variant calling pipelines
- Producing high-resolution images and creating summary features from images
- Combining different types of data
Depending on their funded aims, project teams may have specialized data workflows, which can include:

The anticipated workflow should helpfully be discussed as part of the onboarding, especially if it is complicated or outside of what is typical.
Teams should provide information or other documentation of their workflow.
- Generating sequencing data and deriving data using multiple variant calling pipelines.
- Producing high-resolution images and extracting summary features from images.
- Combining different types of data.

By having information on the data-generating process, the DCC can better work with each team to answer the questions:
- What are the different forms of data that will be generated -- how to optimally intake and manage the data artifacts from this workflow?
- Are there recommendations that can be suggested for this workflow to avoid potential problems downstream?
- What other resources can the DCC provide if possible?

### Data processing by the DCC

Data uploaded to Synapse may also be processed by the DCC for:
- Quality control
- File format conversions
- Other data transformations to allow data to be loaded and shared in cBioPortal/other analysis application
During the onboarding process, it is essential to discuss the anticipated workflow, especially if it is complex or deviates from the standard. Project teams should provide information or documentation regarding their workflow.

## Data organization corresponds to processing stages

Expand Down Expand Up @@ -54,3 +42,16 @@ Subfolders must be of the same data type and level as the root folder they are c
```

By understanding the data generation process, the Data Coordination Center (DCC) can effectively collaborate with each team to address the following questions:

- What are the different types of data that will be generated, and how can the data artifacts from this workflow be optimally handled and managed?
- Are there any recommendations that can be provided to ensure a smooth workflow and avoid potential issues in the downstream analysis?
- What additional resources can the DCC offer, if available?

### Data Processing by the DCC

In addition to the project team's data processing, the DCC also performs data processing on the uploaded data in Synapse. This processing includes:

- Quality control assessments.
- File format conversions.
- Other necessary data transformations to facilitate data loading and sharing in cBioPortal or other analysis applications.
22 changes: 11 additions & 11 deletions pages/onboarding/data-sharing.mdx
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@

## Introduction

The figure below provides an overview of contributing data.
The figure below provides an overview of contributing data.


<iframe allowfullscreen frameborder="0"
src="https://lucid.app/documents/embedded/eb1cfc68-9729-4114-a149-9568b8f92ad5"
<iframe allowfullscreen frameborder="0"
src="https://lucid.app/documents/embedded/eb1cfc68-9729-4114-a149-9568b8f92ad5"
id="ZHOJW4_Xec5K"
width="80%"
width="80%"
height="500px"
>
</iframe>
Expand All @@ -16,17 +16,17 @@ The figure below provides an overview of contributing data.

### Using the UI or programmatic clients

Data should be uploaded to the correct folders; the folder structure is described in the previous section.
Data can be uploaded via the **User Interface (UI)** -- see [docs here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileviatheSynapseUI).
However, for larger and more numerous files, we recommend doing upload **programmatically** (Command line, Python, R) -- see [docs here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileProgrammatically).
Data can be uploaded using the User Interface (UI) of Synapse. You can find detailed instructions in the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileviatheSynapseUI).

However, for larger and numerous files, we recommend utilizing programmatic methods such as the command line, Python, or R. This allows for more efficient and streamlined uploading. You can refer to the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileProgrammatically) for instructions on uploading files programmatically.

### Help and alternative methods

In very special circumstances, the DCC can discuss:
Under unique circumstances, the DCC can explore the following options:

- Mailing a hard drive
- Globus transfers
- Help with programmatic methods
Sending data via physical hard drive
Utilizing Globus transfers
Providing assistance with programmatic methods

## Annotating data

Expand Down

1 comment on commit 4838d4f

@vercel
Copy link

@vercel vercel bot commented on 4838d4f Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.