Skip to content

Commit

Permalink
Merge pull request #22 from ncihtan/specific_assay_files
Browse files Browse the repository at this point in the history
- information about entityIDs, prepopulating manifests.
- additional of assay specifc docs for biospecimen, clinical, RPPA.
- chgs to CDS/CGC data access.
  • Loading branch information
jen-dfci authored Jul 29, 2024
2 parents f4bc6f0 + e3df17b commit 147685b
Show file tree
Hide file tree
Showing 58 changed files with 14,132 additions and 82 deletions.
Binary file modified .DS_Store
Binary file not shown.
310 changes: 310 additions & 0 deletions access_controlled/CDS_access.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>

<meta charset="utf-8">
<meta name="generator" content="quarto-1.3.450">

<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">


<title>cds_access</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
ul.task-list li input[type="checkbox"] {
width: 0.8em;
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
</style>


<script src="CDS_access_files/libs/clipboard/clipboard.min.js"></script>
<script src="CDS_access_files/libs/quarto-html/quarto.js"></script>
<script src="CDS_access_files/libs/quarto-html/popper.min.js"></script>
<script src="CDS_access_files/libs/quarto-html/tippy.umd.min.js"></script>
<script src="CDS_access_files/libs/quarto-html/anchor.min.js"></script>
<link href="CDS_access_files/libs/quarto-html/tippy.css" rel="stylesheet">
<link href="CDS_access_files/libs/quarto-html/quarto-syntax-highlighting.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<script src="CDS_access_files/libs/bootstrap/bootstrap.min.js"></script>
<link href="CDS_access_files/libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
<link href="CDS_access_files/libs/bootstrap/bootstrap.min.css" rel="stylesheet" id="quarto-bootstrap" data-mode="light">


</head>

<body class="fullcontent">

<div id="quarto-content" class="page-columns page-rows-contents page-layout-article">

<main class="content" id="quarto-document-content">



<section id="accessing-sequence-data-via-ncis-cancer-data-service-cds" class="level1">
<h1>Accessing Sequence Data via NCI’s Cancer Data Service (CDS)</h1>
<p>!!! <strong>NOTE</strong>: dbGaP approval for HTAN study <a href="https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002371.v3.p1">phs002371</a> is required in order to access HTAN lower-level genomics data, such as RNAseq FASTQ and BAM files. !!!</p>
<p>The <a href="https://dataservice.datacommons.cancer.gov/">CDS Portal</a>, within NCI’s Cancer Research Data Commons (CRDC), provides an interface to filter and select data from a variety of NCI programs, including controlled-access, primary sequence data from the Human Tumor Atlas Network (HTAN). This page provides directions for importing sequencing data from CDS to the <a href="https://cancergenomicscloud.org/">Cancer Genomics Commons (CGC)</a>.</p>
<p>The directions for accessing sequencing data on CDS are similar to those for <a href="../open_access/cds_imaging.md">Level 2 Imaging Data Access</a>, including Direct Export from CDS to CGC and importing data using a Data Repository Service (DRS) Manifest. Please follow the <a href="../open_access/cds_imaging.md">Level 2 Imaging Data Access</a> directions to access sequencing data, noting the following changes:</p>
<ol type="1">
<li>For Direct Export or Generating a DRS Manifest from CDS, choose <strong>Human Tumor Atlas (HTAN) primary sequence data</strong> on the STUDY section of the left hand sidebar instead of <strong>Human Tumor Atlas (HTAN) imaging data</strong>.</li>
</ol>
<p align="center">
<img width="891" alt="Figure 3" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/14e07c72-16d4-463a-b8b2-1ef5f8d72107">
</p>
<p>&nbsp;</p>
<ol start="2" type="1">
<li>To generate a DRS Manifest from the <a href="https://humantumoratlas.org/">HTAN Data Portal</a>, click <strong>CDS/SB-CGC (dbGaP)</strong> under the <strong>Data Access</strong> filter instead of <strong>CDS/SB-CGC (Open Access)</strong>.</li>
</ol>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="../img/cds_genomics1.png" class="img-fluid figure-img"></p>
<figcaption class="figure-caption">HTAN Portal: Accessing Genomic Data in CDS</figcaption>
</figure>
</div>
</section>

</main>
<!-- /main column -->
<script id="quarto-html-after-body" type="application/javascript">
window.document.addEventListener("DOMContentLoaded", function (event) {
const toggleBodyColorMode = (bsSheetEl) => {
const mode = bsSheetEl.getAttribute("data-mode");
const bodyEl = window.document.querySelector("body");
if (mode === "dark") {
bodyEl.classList.add("quarto-dark");
bodyEl.classList.remove("quarto-light");
} else {
bodyEl.classList.add("quarto-light");
bodyEl.classList.remove("quarto-dark");
}
}
const toggleBodyColorPrimary = () => {
const bsSheetEl = window.document.querySelector("link#quarto-bootstrap");
if (bsSheetEl) {
toggleBodyColorMode(bsSheetEl);
}
}
toggleBodyColorPrimary();
const icon = "";
const anchorJS = new window.AnchorJS();
anchorJS.options = {
placement: 'right',
icon: icon
};
anchorJS.add('.anchored');
const isCodeAnnotation = (el) => {
for (const clz of el.classList) {
if (clz.startsWith('code-annotation-')) {
return true;
}
}
return false;
}
const clipboard = new window.ClipboardJS('.code-copy-button', {
text: function(trigger) {
const codeEl = trigger.previousElementSibling.cloneNode(true);
for (const childEl of codeEl.children) {
if (isCodeAnnotation(childEl)) {
childEl.remove();
}
}
return codeEl.innerText;
}
});
clipboard.on('success', function(e) {
// button target
const button = e.trigger;
// don't keep focus
button.blur();
// flash "checked"
button.classList.add('code-copy-button-checked');
var currentTitle = button.getAttribute("title");
button.setAttribute("title", "Copied!");
let tooltip;
if (window.bootstrap) {
button.setAttribute("data-bs-toggle", "tooltip");
button.setAttribute("data-bs-placement", "left");
button.setAttribute("data-bs-title", "Copied!");
tooltip = new bootstrap.Tooltip(button,
{ trigger: "manual",
customClass: "code-copy-button-tooltip",
offset: [0, -8]});
tooltip.show();
}
setTimeout(function() {
if (tooltip) {
tooltip.hide();
button.removeAttribute("data-bs-title");
button.removeAttribute("data-bs-toggle");
button.removeAttribute("data-bs-placement");
}
button.setAttribute("title", currentTitle);
button.classList.remove('code-copy-button-checked');
}, 1000);
// clear code selection
e.clearSelection();
});
function tippyHover(el, contentFn) {
const config = {
allowHTML: true,
content: contentFn,
maxWidth: 500,
delay: 100,
arrow: false,
appendTo: function(el) {
return el.parentElement;
},
interactive: true,
interactiveBorder: 10,
theme: 'quarto',
placement: 'bottom-start'
};
window.tippy(el, config);
}
const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
for (var i=0; i<noterefs.length; i++) {
const ref = noterefs[i];
tippyHover(ref, function() {
// use id or data attribute instead here
let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
try { href = new URL(href).hash; } catch {}
const id = href.replace(/^#\/?/, "");
const note = window.document.getElementById(id);
return note.innerHTML;
});
}
let selectedAnnoteEl;
const selectorForAnnotation = ( cell, annotation) => {
let cellAttr = 'data-code-cell="' + cell + '"';
let lineAttr = 'data-code-annotation="' + annotation + '"';
const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
return selector;
}
const selectCodeLines = (annoteEl) => {
const doc = window.document;
const targetCell = annoteEl.getAttribute("data-target-cell");
const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
const lines = annoteSpan.getAttribute("data-code-lines").split(",");
const lineIds = lines.map((line) => {
return targetCell + "-" + line;
})
let top = null;
let height = null;
let parent = null;
if (lineIds.length > 0) {
//compute the position of the single el (top and bottom and make a div)
const el = window.document.getElementById(lineIds[0]);
top = el.offsetTop;
height = el.offsetHeight;
parent = el.parentElement.parentElement;
if (lineIds.length > 1) {
const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
const bottom = lastEl.offsetTop + lastEl.offsetHeight;
height = bottom - top;
}
if (top !== null && height !== null && parent !== null) {
// cook up a div (if necessary) and position it
let div = window.document.getElementById("code-annotation-line-highlight");
if (div === null) {
div = window.document.createElement("div");
div.setAttribute("id", "code-annotation-line-highlight");
div.style.position = 'absolute';
parent.appendChild(div);
}
div.style.top = top - 2 + "px";
div.style.height = height + 4 + "px";
let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
if (gutterDiv === null) {
gutterDiv = window.document.createElement("div");
gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
gutterDiv.style.position = 'absolute';
const codeCell = window.document.getElementById(targetCell);
const gutter = codeCell.querySelector('.code-annotation-gutter');
gutter.appendChild(gutterDiv);
}
gutterDiv.style.top = top - 2 + "px";
gutterDiv.style.height = height + 4 + "px";
}
selectedAnnoteEl = annoteEl;
}
};
const unselectCodeLines = () => {
const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
elementsIds.forEach((elId) => {
const div = window.document.getElementById(elId);
if (div) {
div.remove();
}
});
selectedAnnoteEl = undefined;
};
// Attach click handler to the DT
const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
for (const annoteDlNode of annoteDls) {
annoteDlNode.addEventListener('click', (event) => {
const clickedEl = event.target;
if (clickedEl !== selectedAnnoteEl) {
unselectCodeLines();
const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
if (activeEl) {
activeEl.classList.remove('code-annotation-active');
}
selectCodeLines(clickedEl);
clickedEl.classList.add('code-annotation-active');
} else {
// Unselect the line
unselectCodeLines();
clickedEl.classList.remove('code-annotation-active');
}
});
}
const findCites = (el) => {
const parentEl = el.parentElement;
if (parentEl) {
const cites = parentEl.dataset.cites;
if (cites) {
return {
el,
cites: cites.split(' ')
};
} else {
return findCites(el.parentElement)
}
} else {
return undefined;
}
};
var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
for (var i=0; i<bibliorefs.length; i++) {
const ref = bibliorefs[i];
const citeInfo = findCites(ref);
if (citeInfo) {
tippyHover(citeInfo.el, function() {
var popup = window.document.createElement('div');
citeInfo.cites.forEach(function(cite) {
var citeDiv = window.document.createElement('div');
citeDiv.classList.add('hanging-indent');
citeDiv.classList.add('csl-entry');
var biblioDiv = window.document.getElementById('ref-' + cite);
if (biblioDiv) {
citeDiv.innerHTML = biblioDiv.innerHTML;
}
popup.appendChild(citeDiv);
});
return popup.innerHTML;
});
}
}
});
</script>
</div> <!-- /content -->



</body></html>
68 changes: 5 additions & 63 deletions access_controlled/CDS_access.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,75 +8,17 @@ order: 997
**NOTE**: dbGaP approval for HTAN study [phs002371](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002371.v3.p1) is required in order to access HTAN lower-level genomics data, such as RNAseq FASTQ and BAM files.
!!!

The [CDS Portal](https://dataservice.datacommons.cancer.gov/), within NCI's Cancer Research Data Commons (CRDC), provides an interface to filter and select data from a variety of NCI programs, including controlled-access, primary sequence data from the Human Tumor Atlas Network (HTAN).
The [CDS Portal](https://dataservice.datacommons.cancer.gov/), within NCI's Cancer Research Data Commons (CRDC), provides an interface to filter and select data from a variety of NCI programs, including controlled-access, primary sequence data from the Human Tumor Atlas Network (HTAN). This page provides directions for importing sequencing data from CDS to the [Cancer Genomics Commons (CGC)](https://cancergenomicscloud.org/).

# DRS Manifest Files
The directions for accessing sequencing data on CDS are similar to those for [Level 2 Imaging Data Access](../open_access/cds_imaging.md), including Direct Export from CDS to CGC and importing data using a Data Repository Service (DRS) Manifest. Please follow the [Level 2 Imaging Data Access](../open_access/cds_imaging.md) directions to access sequencing data, noting the following changes:

To access data via CDS, first generate a CDS Data Repository Service (DRS) manifest containing the files you would like to obtain. DRS manifests are CSV files and require at minimum the **name** and **drs_uri** of each file of interest. For HTAN data, DRS manifests can be generated in one of three ways:

1. CDS Portal
2. HTAN Data Portal
3. Google BigQuery

## 1. Generating a Manifest File from the CDS Portal

In order to access HTAN imaging data within the [CDS Portal](https://dataservice.datacommons.cancer.gov/), navigate to the portal in a web browser and click on the **Explore CDS Data** button on the landing page.

<p align="center"><img width="364" alt="1" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/40aff1af-a58f-49dc-9253-6ee5e67ef419">
</p>

&nbsp;

On the Data Explorer page, expand the STUDY section on the left sidebar, scroll down, and check the box next to **Human Tumor Atlas (HTAN) primary sequence data**.

<p align="center"><img width="891" alt="Figure 2" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/4a3b97f6-97d6-4c99-b782-88c9a8a74fba"></p>

&nbsp;

This action will change the summary panel to reflect selecting HTAN data only.
1. For Direct Export or Generating a DRS Manifest from CDS, choose **Human Tumor Atlas (HTAN) primary sequence data** on the STUDY section of the left hand sidebar instead of **Human Tumor Atlas (HTAN) imaging data**.

<p align="center"><img width="891" alt="Figure 3" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/14e07c72-16d4-463a-b8b2-1ef5f8d72107"></p>

&nbsp;

Scroll down, or click on the **Collapse View** tab on the upper right just below the query summary line in order to see the tabulated view of all of the participants, samples or files in HTAN.

<p align="center"><img width="891" alt="Figure 4" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/87e90271-a525-4ff7-baeb-4c75ea073a91"></p>

&nbsp;

Click on the **Add All Files** button, or select the check boxes next to all Participants, Samples or Files for a subselection and then click on the **Add Selected** button. This action will update your cart icon in the upper right corner.

<p align="center"><img width="891" alt="Figure 5" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/e79dd76d-8eb1-460f-8957-0e07b3898845"></p>

&nbsp;

Clicking on the cart icon, will bring up a list of the selected files. Click on the **Download Manifest** button in the upper right to download a CSV-formated (Excel compatible) file of this file list.

<p align="center"><img width="891" alt="Figure 6" src="https://github.com/ncihtan/htan_missing_manual/assets/123744798/d3addd28-652d-4320-a109-d0080d1d37df"></p>

&nbsp;


## 2. Generating a Manifest File from the HTAN Data Portal

From the [HTAN Data Portal](https://humantumoratlas.org/), click **CDS/SB-CGC (dbGaP)** under the **Data Access** filter.
2. To generate a DRS Manifest from the
[HTAN Data Portal](https://humantumoratlas.org/), click **CDS/SB-CGC (dbGaP)** under the **Data Access** filter instead of **CDS/SB-CGC (Open Access)**.

![HTAN Portal: Accessing Genomic Data in CDS](../img/cds_genomics1.png)

Navigate to the **Files** tab, check the box next to **Filename** in upper left, and then click **Download selected files**.
![HTAN Portal: Selecting Genomic Files](../img/cds_genomics2.png)

Click **Download Manifest**, which will download a local file called `cds_manifest.csv`.
![HTAN Portal: Download DRS Manifest](../img/cds_genomics3.png)


## 3. Generating a Manifest File from Google BigQuery
HTAN metadata and a mapping of HTAN Data File IDs to CDS DRS URIs are available as Google BigQuery tables via the Institute for Systems Biology Cancer Gateway in the Cloud (ISB-CGC) (see [Google BigQuery](https://docs.humantumoratlas.org/open_access/biq_query/)). These tables can be used to subset data to a cohort of interest, and obtain DRS URIs of files to access.

For a step-by-step guide on how to generate a DRS manifest file using Google BigQuery, please see the Python notebook [Creating_CDS_Data_Import_Manifests_Using_BQ.ipynb](https://github.com/isb-cgc/Community-Notebooks/blob/master/HTAN/Python%20Notebooks/Creating_CDS_Data_Import_Manifests_Using_BQ.ipynb).



# Accessing Data
Once you have your manifest, follow the instructions on SB-CGC's [Import from a DRS server](https://docs.cancergenomicscloud.org/docs/import-from-a-drs-server#import-from-a-manifest-file) documentation page to import data from a manifest file.
Loading

0 comments on commit 147685b

Please sign in to comment.