Skip to content

Commit

Permalink
edits
Browse files Browse the repository at this point in the history
  • Loading branch information
cconrad8 committed Nov 9, 2023
1 parent 7df64ca commit 731f005
Show file tree
Hide file tree
Showing 12 changed files with 145 additions and 113 deletions.
12 changes: 12 additions & 0 deletions docs/governance/data-access.md → docs/governance/data-access.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@ sidebar_position: 2
---
# Data Access

<center>

```mermaid
flowchart TD
A[Phase 1: Private Only] -->| | B[Phase 2: Collaborative Only]
B --> C[Phase 3: Data Released]
C -->D[Controlled Access]
C -->E[Public]
C -->F[Open Access]
```
</center>

The data access for Gray Foundation aligns with the general data access types described for the Synapse platform. You can find more details about these access types [here](https://help.synapse.org/docs/Data-Access-Types.2014904611.html).

## Data Access Tags in the Portal
Expand Down
3 changes: 3 additions & 0 deletions docs/governance/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,16 @@ Data sharing occurs in three distinct phases:

<details>
<summary> Private Phase </summary>

During this phase, data is exclusively accessible to the Contributor and individuals designated by the Contributor. This configuration serves as the default setting unless specified otherwise by the grant. For instance, certain pilot projects may start directly in the Collaborative Sharing Phase.

</details>

<details>
<summary> Collaborative Sharing Phase </summary>

In the Collaborative Sharing Phase, curated data becomes accessible to investigators within the network.

</details>

<details>
Expand Down
9 changes: 7 additions & 2 deletions docs/onboarding/data-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ However, not all data may be available for analysis -- whether data is "elevated

Below, we provide more details on analysis apps and which data types are targeted for them.

<center>

| Data Type | Analysis App |
| ------------------------------------ | ----------------- |
| Copy Number Variation (CNV) | cbioportal |
Expand All @@ -21,7 +23,10 @@ Below, we provide more details on analysis apps and which data types are targete
| Single Cell RNA Sequencing Data | CellXGene |
| CyCIF Imaging Data | Minerva Viewer |

</center>

#### External Viewers

The [cBioPortal](https://docs.cbioportal.org/) for Cancer Genomics was originally developed at Memorial Sloan Kettering Cancer Center (MSK) for sharing genomic data. We provide a custom instance for the Gray Foundation.
- [cBioPortal](https://docs.cbioportal.org/) for Cancer Genomics was originally developed at Memorial Sloan Kettering Cancer Center (MSK) for sharing genomic data. We provide a custom instance for the Gray Foundation.

[CELLxGENE](https://github.com/chanzuckerberg/cellxgene) ("cell-by-gene") is an interactive data explorer specifically designed for analyzing single-cell datasets.
- [CELLxGENE](https://github.com/chanzuckerberg/cellxgene) ("cell-by-gene") is an interactive data explorer specifically designed for analyzing single-cell datasets.
41 changes: 0 additions & 41 deletions docs/onboarding/onboarding-checklist.md

This file was deleted.

47 changes: 47 additions & 0 deletions docs/onboarding/onboarding-checklist.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
sidebar_position: 1
---

# Onboarding Checklist

👋 We welcome new teams/new staff with an overview that covers the Gray Foundation Data Coordination Center and what setup is needed


<details>
<summary> New Teams Webinar </summary>

_All personnel (PI, researchers, staff, project managers, etc) are encouraged to attend a "New Teams Webinar hosted by Sage Bionetworks/DCC._

The objective of this 1-hour webinar is to:
- Provide a face-to-face introduction to the DCC staff including Sage Bionetworks and MSKCC
- Review the "Onboarding Checklist"
- Introduce teams to open-science data storage and analysis tools (ie. Synapse and cBioportal)
- Discuss potential data sharing goals and brain storm data-reusability and analysis opportunities
- Inform teams of opportunities to join working groups (i.e. the analysis working group) and collaborate with other Gray Foundation researchers

</details>


#### Principal Investigators:

- [ ] Designate at least one data lead contact for the team. _(Note: Multiple data leads are possible if responsible for different data types.)_
- [ ] Create a certified account on the Synapse platform.
- [ ] Complete a data sharing plan.
- [ ] Review and approve your public profile on the documentation site and Synapse.
- [ ] Provide the name, email address, and role of key personnel to the DCC.
- [ ] Finalize governance and data sharing policy plans.
- [ ] (Optional) Join the Gray Foundation Slack workspace.

#### Data Leads and Contributors:

- [ ] Confirm the review of documentation and forward any questions or concerns for early resolution.
- [ ] Determine a preferred touchbase schedule and medium (e.g., quarterly over video chat or monthly over email).
- [ ] Create certified accounts on the Synapse platform.
- [ ] Confirm that appropriate team access permissions are granted on the platform.
- [ ] (Optional) Seek more interactive training on the Synapse platform, especially if not comfortable with usage based solely on documentation.
- [ ] (Optional, upon request) Receive additional interactive training on the clinical data and metadata intake process, recommended for those not comfortable with usage based solely on documentation.
- [ ] (Optional) Join the Gray Foundation Slack workspace.

#### Supplemental Onboarding:

Onboarding is available for new personnel designated as data leads after a project has started. Please contact the DCC as soon as new personnel relevant to the project are identified. The DCC will reach out with onboarding interactions and steps adapted from the above. This process can be more flexible due to potential prior knowledge transfer.
75 changes: 41 additions & 34 deletions docs/onboarding/upload-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,56 +4,63 @@ sidebar_position: 4

# Uploading Data

First, make sure you know where your data should go. Most data have a pre-designated folder, and data are organized by assay and then the level of data. You can often create subfolders if you prefer further organization by batches.
Before you begin, identify the destination for your data. Most data are organized in pre-assigned folders based on assay and data levels. You can create subfolders for additional organization, especially for batch-specific data.

### Using the UI
The UI is only recommended for smaller numbers of files and files under ~100Mb in size. In the designated folder, go to to the Folder Tools menu to see upload options. General documentation for the UI can be found here(opens in a new tab).
#### Using Synapse User Interface (UI)

### Programmatic clients
This is suitable for larger and more numerous files and allows for more efficient and streamlined uploading. Three clients exist: the command line tool, Python script, or R. For all options refer to the documentation here(opens in a new tab).
The UI is suitable for smaller files less than 100MB. In the designated folder, access the Folder Tools menu for upload options. Refer to the general UI documentation for details.

### Most typical workflow
The command-line workflow should be familiar to you if you are a bioinformatician/data engineer/data scientist. If you need help, please contact the DCC.
#### Programmatic Clients

### For the most typical and robust workflow, here are the steps with the Python command-line client:
For larger and numerous files, use programmatic clients for efficient uploading. Options include the command-line tool, Python script, or R. Find detailed documentation for each option. Reach out to the DCC for assistance.

- Install Python package from PyPI: https://pypi.org/project/synapseclient/(opens in a new tab). This will also automatically install the command line utility. Test out that the command line utility is working by typing synapse help and feel free to review docs(opens in a new tab) for the Python CLI client.
- For large uploads, it is best to create an access token, if you haven't already through the earlier Account Setup guidance. Go to your Account profile > Account Settings > Personal Access Tokens > Create new token.
- For convenience, copy and paste the token into a .synapseConfig text file. This text file looks like:
[authentication]
authtoken = sometokenstringxxxxxxxxxxxxxxxxxx
First, you need to create a list of files to transfer (called a manifest). The parent-id is the Synapse folder you are trying to upload files to.
synapse manifest --parent-id syn12345678 --manifest-file manifest.txt PATH_TO_DIR_WITH_FILES
#### Typical Workflow with Python Command-Line Client

At this step, if you are not a Certified User, the tool will output a message for this issue. Review and complete the Certified User portion of Account Setup before proceeding.
1. **Install Python Package**: Install the Synapse Python package from [PyPI](https://pypi.org/project/synapseclient/). This will also automatically install the command line utility. Test out that the command line utility is working by typing `synapse help` and feel free to review [docs](https://python-docs.synapse.org/build/html/index.html) for the Python CLI client.

Successful execution should create a manifest file manifest.txt for the tool to use. Now make sure that .synapseConfig is present locally to provide authentication. You can now do:
synapse sync manifest.txt
2. **Create Access Token**: For large uploads, it is best to create an access token. Go to your Account profile > Account Settings > Personal Access Tokens > Create new token.

There are options for retries if you might have poor connection, e.g.
3. **Create Configuration File**: For convenience, copy and paste the token into a `.synapseConfig` text file:

synapse sync --retries 3 manifest.txt
```plaintext
[authentication]
authtoken = sometokenstringxxxxxxxxxxxxxxxxxx
```

One-off uploads
For just a few files (but maybe still very large-size files) where you don't need to create a list of files, a more convenient command might be:
4. **Create Manifest File**: Create a list of files to transfer (called a manifest). The parent-id is the Synapse folder you are trying to upload files to:

synapse store my_image.tiff --parentId syn12345678
```bash
synapse manifest --parent-id syn12345678 --manifest-file manifest.txt PATH_TO_DIR_WITH_FILES
```

Alternative methods
Under rare unique circumstances, the DCC can explore the following options:
5. **Certified User Check**: If you are not a Certified User, the tool will output a message. Review and complete the Certified User portion of Account Setup before proceeding.

6. **Execute Sync Command**: Successful execution should create a manifest file `manifest.txt`. Ensure that `.synapseConfig` is present locally to provide authentication:

```bash
synapse sync manifest.txt
```

Receiving data via physical hard drive
Utilizing Globus transfers (if really needed)
Transferring from a custom S3 bucket
Options for retries in case of a poor connection:

```bash
synapse sync --retries 3 manifest.txt
```

#### One-off Uploads

4. Data release
For just a few files, a more convenient command might be:

Once data has been uploaded and annotated, they can be reviewed for any issues, assigned a DOI (especially to use in a publication), and shared in other apps or made public. However, this does not usually happen right after data upload because of the embargo/holding period where it has to be confirmed that data can be moved to the next phase.
```bash
synapse store my_image.tiff --parentId syn12345678
```

#### Alternative Methods

Under rare unique circumstances, the DCC can explore the following options:

Bring data to cBioPortal, CELLxGENE, etc.
The DCC will coordinate on this manually once data is available in Synapse.
- Receiving data via physical hard drive
- Utilizing Globus transfers (if really needed)
- Transferring from a custom S3 bucket

Adding a DOI
This can be done to include in a publication by you directly or with help from DCC.
Feel free to adjust or customize it according to your needs!
7 changes: 5 additions & 2 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
// Note: type annotations allow type checking and IDEs autocompletion

const { themes } = require('prism-react-renderer');
const lightTheme = themes.github;
const darkTheme = themes.dracula;
const lightCodeTheme = themes.github;

/** @type {import('@docusaurus/types').Config} */
const config = {
Expand Down Expand Up @@ -140,6 +139,10 @@ const config = {

},
}),
themes: ['@docusaurus/theme-mermaid'],
markdown: {
mermaid: true,
},
};

module.exports = config;
21 changes: 0 additions & 21 deletions docusaurus.config.ts

This file was deleted.

Loading

0 comments on commit 731f005

Please sign in to comment.