Cconrad8 patch 1 (#67)

* Add files via upload * Delete docusaurus-site-main.zip * add docusaurus website * docusaurus site * change website link * install next * color change and checklist fix * Remove unnecessary workflow * Remove unnecessary workflows/jekyll-gh-pages.yml * add diagram and mission * mermaid update, mission statement, diagram * update yarn * nvm update --------- Co-authored-by: Anh Nguyet Vu <[email protected]>
gf-dcc · Nov 15, 2023 · 0d0c320 · 0d0c320 · vercel · Nov 15, 2023
1 parent c8d47bc
commit 0d0c320
Show file tree

Hide file tree

Showing 90 changed files with 54,716 additions and 8,627 deletions.
diff --git a/.github/workflows/add-issues-to-project.yml b/.github/workflows/add-issues-to-project.yml
diff --git a/.gitignore b/.gitignore
@@ -1,38 +1,20 @@
-.DS_STORE
-
-# Dependency directory
-node_modules
-
-scripts/flow/*/.flowconfig
-.flowconfig
-*~
-*.pyc
-.grunt
-_SpecRunner.html
-__benchmarks__
-build/
-remote-repo/
-coverage/
-
-# Caches
-.module-cache
-.npm
-
-# Tests
-fixtures/dom/public/react-dom.js
-fixtures/dom/public/react.js
-test/the-files-to-test.generated.js
-
-*.log*
-chrome-user-data
-
-
-.idea
-*.iml
-.vscode
-*.swp
-*.swo
-
-# REPL history
-.node_repl_history
-.next
+# Dependencies
+/node_modules
+
+# Production
+/build
+
+# Generated files
+.docusaurus
+.cache-loader
+
+# Misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
diff --git a/README.md b/README.md
@@ -1,26 +1,40 @@
-# Gray Foundation DCC Docs 
-These docs are provided by the Gray Foundation Data Coordinating Center (GF-DCC) to describe the consortium portal, data, metadata, and other topics.
-(This is a work in progress.)
+# Website
 
-## For Documentation Users
+This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
 
-### Suggestions and Questions
-If you have suggestions or questions as a researcher (either as a Gray Foundation-supported or an external researcher interacting with our data/portal), we prefer you [file an issue here](https://github.com/gf-dcc/docs/issues). 
-Alternatively, if you do not have a GitHub account or if the issue is sensitive, contact us at [email protected].
+### Installation
 
-## For Documentation Contributors
+```
+$ yarn
+```
 
-Please add or update a page/section in a new branch and submit a pull request (PR).
-It is helpful to read this [guide to MDX and Markdown](https://kabartolo.github.io/chicago-docs-demo/docs/mdx-guide/writing/) (knowledge of Markdown only will be sufficient for most sections).
+### Local Development
 
-We're using the [Nextra](https://github.com/shuding/nextra) docs theme. 
+```
+$ yarn start
+```
 
-### Guide to Getting Started Locally 
+This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
 
-Docs can be edited in "local development" mode with live previews. This especially helps if the changes are large and complicated (e.g. going beyond adding a new paragraph).
-However, this requires some technical setup. 
-1. Install Node.js on [Windows, MacOs, or Linux](https://kinsta.com/blog/how-to-install-node-js/). 
-2. [Install yarn](https://www.hostinger.com/tutorials/how-to-install-yarn) and use yarn to install dependencies for this docs site. 
-3. Run `yarn next` to preview the site at `http://localhost:3000`. As you make changes, live reload will show the updated site.
+### Build
 
+```
+$ yarn build
+```
+This command generates static content into the `build` directory and can be served using any static contents hosting service.
 
+### Deployment
+
+Using SSH:
+
+```
+$ USE_SSH=true yarn deploy
+```
+
+Not using SSH:
+
+```
+$ GIT_USER=<Your GitHub username> yarn deploy
+```
+
+If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
diff --git a/babel.config.js b/babel.config.js
@@ -0,0 +1,3 @@
+module.exports = {
+  presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
+};
diff --git a/blog/2023-10-15.md b/blog/2023-10-15.md
@@ -0,0 +1,30 @@
+---
+slug: first-blog-post
+title: Unveiling a New Hub for Collaborative Innovation
+authors: [Christina]
+tags: [welcome, events]
+---
+
+In the ever-evolving landscape of scientific research and data-driven advancements, the Gray Foundation is at the forefront of fostering collaborative projects and providing valuable resources to the cancer research community.
+
+To stay informed and share data, the GF-data coordinating center (DCC) has recently launched a  new website and various support channels to facilitate seamless collaboration and data sharing.
+
+### A Fresh Perspective: The New GF-DCC Information Website
+
+The GF-DCC has introduced its new information website, located at https://gf-dcc.github.io/info-website/. This website serves as an essential hub for understanding the foundation's goals and accessing their data portal.
+
+### The Power of Collaboration: Ongoing Projects Across Multiple Institutions
+
+The GF-DCC is all about fostering collaboration among various institutions and individuals who share a passion for BRCA related cancer research. Stay tuned for exciting updates and breakthroughs resulting from these collaborative endeavors.
+
+### Portal Design Working Group: Join the Monthly Discussions
+
+If you're interested in shaping the design and usability of the data portal, the GF-DCC's Portal Design Working Group offers the perfect opportunity. Meetings are scheduled for the first Thursday of every month. To participate and get more details, reach out to Christina Conrad. Your insights and expertise can help make the data portal even more user-friendly and efficient.
+
+### New Documentation Website: Your Go-To Resource for Guidance
+
+The GF-DCC has launched a dedicated documentation website to serve as your go-to resource for guidance on utilizing the foundation's tools and services. Whether you need help with data integration, analysis, or any other aspect of your research, this resource is designed to assist you.
+
+### A Direct Line of Support: Introducing the Service Desk
+
+The GF-DCC has introduced a new service desk to streamline communication and support. If you have any inquiries, bug reports, or feedback regarding Synapse, documentation, data portal, or any other aspect of the GF-DCC ecosystem, you can reach out to the service desk at https://sagebionetworks.jira.com/servicedesk/customer/portal/17. Your input is invaluable in enhancing the foundation's services and resources.
diff --git a/blog/authors.yml b/blog/authors.yml
@@ -0,0 +1,5 @@
+Christina:
+  name: Christina Conrad
+  title: Biomedical Data Manager
+  url: https://github.com/cconrad8
+  image_url: https://avatars.githubusercontent.com/u/114612268?s=400&u=e2bf1638114cceb4973b2998476b2bf7e70c4605&v=4
diff --git a/docs/RoadMap.png b/docs/RoadMap.png
diff --git a/docs/data annotation support/_category_.json b/docs/data annotation support/_category_.json
@@ -0,0 +1,9 @@
+{
+  "label": "Data Annotation Support",
+  "position": 3,
+  "collapsible": true,
+  "collapsed": true,
+  "link": {
+    "type": "generated-index"
+  }
+}
diff --git a/docs/data annotation support/clinical-data-standards.mdx b/docs/data annotation support/clinical-data-standards.mdx
@@ -0,0 +1,37 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Creating a Clinical Metadata Template for BRCA Pre-Cancer Research
+
+<div className="unique-tabs">
+  <Tabs>
+    <TabItem value="Clinical Data Standards">
+      The development of a clinical metadata template is central to the BRCA pre-cancer research initiative, aiming to harmonize and streamline the collection of crucial information related to human patients and biospecimens, such as blood and tissue samples. This template addresses key inquiries:
+      * How can we establish a standardized approach to represent and model patient and sample data consistently across a diverse array of project teams operating in multiple institutions?
+      * What data elements can be pragmatically collected and standardized across each project team while aligning with the overarching goals of the research consortium?
+      * Which clinical phenotypes are of significant value for conducting more in-depth and insightful analyses specific to BRCA pre-cancer research?
+    </TabItem>
+    <TabItem value="Request for Comments Process">
+      The minimal clinical data model was meticulously crafted through a collaborative Request for Comments (RFC) process. For detailed insights, you can review the RFC engineering process [here](https://coda.io/d/RFC-1-MCDM_dxVlTrpKblb/RFC-1-Minimal-Clinical-Data-Model_suv9y#_lu9Ei). This process provides essential context for defining what constitutes "minimal" data and why certain elements are of interest to various stakeholders. The templates featured in the following section closely align with the outcomes of this collaborative effort. Nevertheless, the potential exists for developing a more extensive and comprehensive clinical data model based on the decisions of the consortium members and the recommendations of Program Directors.
+    </TabItem>
+    <TabItem value="Agreed upon Terms">
+| Demographic Information | Health Status | Genetic Information | Lifestyle and Habits | Study-Related Information |
+| ----------------------- | ------------- | ------------------- | ------------------- | ------------------------- |
+| ParticipantID           | VitalStatus   | ClassBRCA1         | TobaccoUse          | Component                |
+| Age                    | Height        | ClassBRCA2         | PackYearsSmoked     | TimepointLabel           |
+| Sex                    | Weight        |                   | AlcoholUse          | SampleCollectionCenter   |
+| BMI                    | Gravidity     |                   | NumberofDrinksPerWeek | SampleCollectionYear     |
+| Race                   | Parity        |                   | AntibioticUse        |                         |
+| Ethnicity               | AgeAtMenarche |                   |                     |                         |
+|                        | MenstrualCyclePhase |                 |                     |                         |
+|                        | MenopauseStatus    |                 |                     |                         |
+|                        | PrimaryDiagnosis   |                 |                     |                         |
+|                        | AgeatDiagnosis     |                 |                     |                         |
+|                        | DiagnosisStatusType |                |                     |                         |
+|                        | ERStatus           |                 |                     |                         |
+|                        | PRStatus           |                 |                     |                         |
+|                        | HER2Status         |                 |                     |                         |
+
+    </TabItem>
+  </Tabs>
+</div>
diff --git a/docs/data annotation support/fair-data.mdx b/docs/data annotation support/fair-data.mdx
@@ -0,0 +1,39 @@
+---
+sidebar_position: 1
+---
+
+# FAIR Science: A Model for Enhanced Data Management
+
+#### The Significance
+
+The [FAIR guiding principles for scientific data management and stewardship](https://www.go-fair.org/fair-principles/) underscore the importance of making data **F**indable, **A**ccessible, **I**nteroperable, and **R**eusable. These principles form the cornerstone of a scientific ecosystem that prioritizes transparency, data reuse, and collaborative efforts, ultimately expediting breakthroughs in science. Furthermore, commencing in 2023, the [NIH Data Management and Sharing Policy](https://sharing.nih.gov/data-management-and-sharing-policy/about-data-management-and-sharing-policies/data-management-and-sharing-policy-overview) mandates data sharing for NIH-supported studies. The Gray Foundation DCC plays a pivotal role in enabling consortium teams to not only comply with but also exceed NIH data sharing policies, thereby advancing the pursuit of superior scientific endeavors.
+
+#### Benefits to Consortium Teams
+
+This commitment to FAIR science yields a multitude of advantages for consortium teams:
+
+1. **Enhanced Collaboration in the Present:** Consortium teams experience increased collaborative productivity as they gain the ability to effortlessly locate and utilize shared data, fostering synergy in current research endeavors.
+
+2. **Empowered Future Grant Proposals:** The practice of data sharing positions consortium teams favorably when applying for grants, particularly those focused on Team Science. A history of effective data sharing demonstrates a commitment to the broader scientific community, aligning with evolving expectations in the field.
+
+   > In 2010, a group of scientists advocated for the consideration of all products stemming from research grants, extending beyond traditional peer-reviewed publications. This encompasses the sharing of raw data and self-published results through digital platforms and social media. Notably, this movement has already influenced policy changes, with organizations like the NSF expanding their requirements to include products such as datasets, software, patents, and copyrights ([Funding and Evaluation of Team Science](https://www.ncbi.nlm.nih.gov/books/NBK310379/)).
+
+3. **Heightened Impact and Citations:** By making data, protocols, and related resources accessible, consortium teams open doors for other researchers to employ their work, leading to increased recognition and citations from the broader scientific community.
+
+The journey toward FAIR science necessitates the collection of comprehensive metadata to describe and enhance the findability and reusability of data and resources. A well-defined data model determines the prioritized collection of this metadata, guiding consortium teams toward excellence in data management and sharing practices.
+
+
+#### Development Process
+
+The DCC develops the model using existing standards while also consulting the contributing teams for specialized data. 
+A broader development process, when the model affects the wider consortium, involves a Request for Comments (RFC) process. 
+For example, please see the public [first RFC document](https://coda.io/@gf-dcc/rfc-1-mcdm) (archived/closed) that has helped determine the current core model.
+Continued development, if necessary, can involve additional RFCs. 
+
+#### Data Model Source
+
+The data model source is versioned [here](https://github.com/gf-dcc/data-model). 
+
+#### Capturing Metadata Through Templates
+
+Data are captured through spreadsheet templates, available as either Google Sheets or Excel (in cases where contributors can't access Google products due to institutional policies). 
diff --git a/docs/data annotation support/file-dataset.mdx b/docs/data annotation support/file-dataset.mdx
@@ -0,0 +1,27 @@
+---
+sidebar_position: 2
+---
+
+# File vs. Dataset Annotations
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## File-level Metadata
+
+<Tabs className="unique-tabs">
+  <TabItem value="file-level" label="File-level Metadata" default>
+    Assay-specific metadata is collected for uploaded data files. 
+    Metadata needed for sequencing `.fastq` files from a sequencing assay is different from metadata for `.tiff` files from an imaging assay -- therefore a different metadata template is provided for each of these data types.
+
+    Moreover, even with the same assay, the output from different levels of processing will require different metadata templates. 
+    The "raw" sequencing `.fastq` data from RNA-seq require somewhat different annotations than the summarized expression data. 
+
+    Our concept of data levels is highly derived from the [Genomics Data Commons data levels](https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels). 
+  </TabItem>
+
+  <TabItem value="dataset-level" label="Dataset-level Metadata">
+    The files are grouped together and shared as a dataset. 
+    A dataset is later published and catalogued in the portal once the team is ready to release the data (i.e. once the paper has been published).
+  </TabItem>
+</Tabs>
diff --git a/docs/data sharing and access policies/Governance Roles - Page 2.png b/docs/data sharing and access policies/Governance Roles - Page 2.png
diff --git a/docs/data sharing and access policies/_category_.json b/docs/data sharing and access policies/_category_.json
@@ -0,0 +1,9 @@
+{
+  "label": "Data Sharing and Access Policies",
+  "position": 3,
+  "collapsible": true,
+  "collapsed": true,
+  "link": {
+    "type": "generated-index"
+  }
+}
diff --git a/pages/governance/data-access.mdx → ...aring and access policies/data-access.mdx b/pages/governance/data-access.mdx → ...aring and access policies/data-access.mdx
@@ -1,57 +1,64 @@
-import { Steps } from 'nextra-theme-docs'
-import { Callout } from 'nextra-theme-docs'
-import Image from 'next/image'
-
+---
+sidebar_position: 2
+---
 # Data Access
 
+<center>
+
+    ```mermaid
+    flowchart TD
+        A[Phase 1: Private Only] -->| | B[Phase 2: Collaborative Only]
+        B --> C[Phase 3: Data Released]
+        C -->D[Controlled Access]
+        C -->E[Public]
+        C -->F[Open Access]
+    ```
+</center>
+
 The data access for Gray Foundation aligns with the general data access types described for the Synapse platform. You can find more details about these access types [here](https://help.synapse.org/docs/Data-Access-Types.2014904611.html).
 
 ## Data Access Tags in the Portal
 
-<Callout type="info" emoji="ℹ️">
-  This is an upcoming visual labeling system designed to represent access in a more user-friendly manner within the portal.
-</Callout>
-
-<Steps>
-### Phase 1 Datasets
+<details>
+<summary> Phase 1 Datasets </summary>
 
 ##### "PRIVATE"
-<Image src="/badge_phase1_PRIVATE.svg" alt="private data access" width={160} height={30} />
+
 Phase 1 datasets will most likely not be visible in the portal. If they do appear, the pink "PRIVATE" tag will indicate that the data exists on Synapse but is still undergoing compilation/analysis. At the moment, there is no intention to share this data, so please refrain from contacting the team regarding its availability.
 
-### Phase 2 Datasets
+</details>
+
+<details>
+<summary> Phase 2 Datasets </summary>
 
 ##### "COLLABORATIVE ONLY"
-<Image src="/badge_phase2_COLLAB.svg" alt="collaborative data access" width={270} height={30} />
+
 Phase 2 datasets may be visible in the portal but will be labeled as "COLLABORATIVE ACCESS ONLY". Researchers interested in collaboration will need to request access by contacting the Team PI.
 - Data is private and can only be downloaded by individuals or teams explicitly granted permission.
 - Admins (PIs and Data Leads) for the team's repository can grant view or download access by adding individuals or teams to the folder or project. This allows data to be shared for collaboration without moving it to Phase 3.
 - In conversation, this data may also be referred to as being "under embargo".
 
-### Phase 3 Datasets
+</details>
+
+<details>
+<summary> Phase 3 Datasets </summary>
 
 Phase 3 datasets often consist of data released concurrently with or after publication, although certain access conditions may still apply. Within Phase 3, data is classified into several access categories:
 
 ##### "CONTROLLED ACCESS"
-<Image src="/badge_phase3_CONTROLLED.svg" alt="controlled data access" width={200} height={30} />
+
 - Registered Synapse users can download this data after fulfilling additional requirements such as accepting Conditions of Use, submitting a statement, or obtaining approval from governance.
 - For certain controls, even individuals on the Access Control List with download permissions will need to complete the required action to download data, as these controls are implemented through a different data access layer.
 - This typically applies to raw sequencing data.
 
 ##### "PUBLIC"
-<Image src="/badge_phase3_PUBLIC.svg" alt="public data access" width={140} height={30} />
 - The data can be viewed by anyone on the web, but only registered Synapse users can download it. There are no restrictions or additional steps required for downloading.
 - This usually applies to processed data such as summarized gene expression and cell and tissue images.
 
 ##### "OPEN"
-<Image src="/badge_phase3_OPEN.svg" alt="open data access" width={130} height={30} />
 - Anyone on the web can view and download this data without requiring an account.
 - This usually applies to metadata.
 
-</Steps>
-
-For more information, please refer to the policies on our [Data Access](/help/data-access) page or contact our governance analysis team at [email protected].
-
-### See Also
+</details>
 
-For specific information regarding Synapse governance, please visit https://help.synapse.org/docs/Synapse-Governance.2004255211.html.
+_For more information regarding Synapse governance, please visit [Synapse Help](https://help.synapse.org/docs/Synapse-Governance.2004255211.html)_
diff --git a/docs/data sharing and access policies/img/docsVersionDropdown.png b/docs/data sharing and access policies/img/docsVersionDropdown.png
diff --git a/docs/data sharing and access policies/img/localeDropdown.png b/docs/data sharing and access policies/img/localeDropdown.png