Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data documentation #2

Open
1 of 12 tasks
LouisLeNezet opened this issue Jun 7, 2023 · 9 comments
Open
1 of 12 tasks

Add data documentation #2

LouisLeNezet opened this issue Jun 7, 2023 · 9 comments
Assignees
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@LouisLeNezet
Copy link
Contributor

Each data file need to be name correctly and documentation should be updated to data.R

  • GEX5line.txt
  • InputData_for_Survival_v1.txt
  • L29_vitro_Control_vs_knockdown_diff.txt
  • Limma.csv
  • MS_2.rda
  • TCGA_CHOL_Clinical_PatientID.txt
  • TCGA_CHOL_Expression_PatientID.txt
  • autosomes.beta.txt.sorted.chr16
  • list_snp_tohighlight.tsv
  • pbmc3k.rds
  • phenoData.txt
  • hg19_chr_list.rda (Louis)
@LouisLeNezet LouisLeNezet self-assigned this Jun 7, 2023
@LouisLeNezet LouisLeNezet added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Jun 7, 2023
@LouisLeNezet
Copy link
Contributor Author

Hi everyone !

I would need help on this one.
For each dataset that you would like to include inside the package we need to have a documentation for each of them.
Could you provide me where did you get each of them and how you proceed them ?

Thanks

@aobermayer4
Copy link
Contributor

I do wonder what the formats of each of these are to see if any similarly formatted can be reduced to just one dataset, or if we could use an existing R package to get the data (how the lung data in plotKM was gathered), or maybe we could just have R generate a randomized dataset in the format that is needed. Also, for what we do keep, depending on the size, converting what is used to an R data object might help with size and load time.

I contributed the TCGA CHOL (Cholangiocarcinoma) data, which was derived from the genomic data commons and Cbioportal. This has an expression matrix and the clinical/phenotype data which also has survival data, so it could be used for the plotKM also. I will have to do some looking into for an exact source.

Should each of them have documentation similar to this: https://github.com/stjude-biohackathon/KIDS23-Team13/blob/main/man/hg19_chr_list.Rd ?

@LouisLeNezet
Copy link
Contributor Author

That would be great I think to use already available datasets.
That's recommended by bioconductor.
The documentation for the data is generated from the https://github.com/stjude-biohackathon/KIDS23-Team13/blob/main/R/data.R
file.

@aobermayer4
Copy link
Contributor

I see, would we just append onto that document with each dataset?

@LouisLeNezet
Copy link
Contributor Author

Yes that what I've saw in other packages.

@KMcC73
Copy link
Contributor

KMcC73 commented Jun 9, 2023

Sorry for the delay, I've had some other things come up. The only problem is that the single-cell mitochondrial data is not commonly available, to my knowledge. In this case, it might be best to provide an example file and I just need to read up on what type of documentation is required.

@aobermayer4
Copy link
Contributor

aobermayer4 commented Jun 10, 2023

@KMcC73 No worries! It's understandable not all data might not be easily available, but Louis provided this (https://github.com/stjude-biohackathon/KIDS23-Team13/blob/main/R/data.R) as an example to follow for documenting our data

@KMcC73
Copy link
Contributor

KMcC73 commented Jun 11, 2023

Great, thank you! I updated the script for the mtCoverage plot so that it is now interactive, has title input. I wanted to make sure I have the functionality in the correct format/corect script structure, but when I went to look for the initial R plot files that @LouisLeNezet had already structured, I cannot find them. Where should I deposit this update file and where are the old ones for a comparison? Sorry, not well-versed with Github protocol.

@LouisLeNezet
Copy link
Contributor Author

Hi,

The files when correctly formatted need to be directly at the root of the R folder (not in modules), I've put my examples there.
Concerning the example datasets, the original file should be in data-raw and then cleaned up in the files-to-rda.R and exported to rda file in the data folder.
The best would be to use already public accessible files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants