Cheatsheet for details about Harvard PGP Dataset
and still need to be fact-checked!
- Some graphs of the # genomes etc. https://my.pgp-hms.org/public_genetic_data/statistics
- Participant profiles https://my.pgp-hms.org/users
- Public data https://my.pgp-hms.org/public_genetic_data (takes a while to load)
- Internal
We release whole genomes. In terms of individuals,
- 10 Illumina (earliest).
- ~200 Complete Genomics.
- ~20 Complete Genomics Long Fragment Read (CG LFR).
- Find it here: https://my.pgp-hms.org/public_genetic_data
- External
- Participants can upload any type of data they want to.
- Data-type is self-reported, trust at your own risk.
- e.g. 23andMe, https://my.pgp-hms.org/public_genetic_data?utf8=%E2%9C%93&data_type=23andMe&commit=Search
- Surveys
- Anyone enrolled (passed consent quiz) can take surveys
- Surveys can be taken multiple times
- Some subset will have their genome sequenced
- Cover everything from cancer to blood type
- Survey responses: https://my.pgp-hms.org/google_surveys
- Survey questions: https://github.com/PGPHarvard/pgp-surveys/tree/master/Surveys
- Get-Evidence
- Research-uses only variant report returned to participants
- http://evidence.pgp-hms.org/genomes
- Sample Collection Timestamps
- Cell Lines
- Can be ordered from Coriell
- https://catalog.coriell.org/0/Sections/Collections/NIGMS/PGPs.aspx?PgId=772&coll=GM
- Files can be publicly accessed / manipulated through Arvados
- https://workbench.su92l.arvadosapi.com/projects/su92l-j7d0g-1d2se4f08r0q7ta#Data_collections
- Trios
- https://sites.stanford.edu/abms/content/first-data-giab-pgp-trios-ftp
- https://github.com/deflaux/codelabs/blob/pgp-cgi-only/R/PlatinumGenomes-QC/Sample-Level-QC.md
- http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/pgp_public_data.html
How and why are people using PGP data?
- Wellesley PGHCI Lab