-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add .ssf for many packages #114
Conversation
Great! Here are some things I think should be changed:
I think we could also remove the quotes from the ena table .tsv files, if they're not necessary. In libre office, for example, you can specify the writing behaviour for .csv/.tsv files and turn off wrapping everything in quotes. |
Some ENA tables have almost no quotes (e.g. 2020_AgranatTamir_LevantBA/ENAtable.tsv), while others have quotes in every field (e.g. 2020_Nagele_Caribbean/ENAtable.tsv). Why is that? |
2021_PattersonNature/ENAtable.tsv
Outdated
"SAMEA10556698" "PRJEB47891" "ERR7195387" "I16611.MT" "I16611" "ERS8208682" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208682" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195387/ERR7195387.fastq.gz" 841043 "1c32f55bdb42435f948e4306d3deff5a" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195387/ERR7195387.fastq.gz" 45470 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195387/I16611.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195387/I16611.MT.bam.bai" | ||
"SAMEA10556735" "PRJEB47891" "ERR7195424" "I17145" "I17145" "ERS8208719" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208719" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/004/ERR7195424/ERR7195424.fastq.gz" 295911199 "f64d265c4f93f4ed630209214896a9f8" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/004/ERR7195424/ERR7195424.fastq.gz" 9572979 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195424/I17145.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195424/I17145.bam.bai" | ||
"SAMEA10556448" "PRJEB47891" "ERR7195138" "I15048" "I15048" "ERS8208433" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208433" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/008/ERR7195138/ERR7195138.fastq.gz" 463718032 "e6a707435a6e2115a043a24acc5a8681" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/008/ERR7195138/ERR7195138.fastq.gz" 13918874 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195138/I15048.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195138/I15048.bam.bai" | ||
"SAMEA10556451" "PRJEB47891" "ERR7195141" "I15049.MT" "I15049" "ERS8208436" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208436" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/001/ERR7195141/ERR7195141.fastq.gz" 497601 "7a519df57f99d136149a6f21db59f8ed" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/001/ERR7195141/ERR7195141.fastq.gz" 23871 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195141/I15049.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195141/I15049.MT.bam.bai" | ||
"SAMEA10556452" "PRJEB47891" "ERR7195142" "I15071" "I15071" "ERS8208437" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208437" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/002/ERR7195142/ERR7195142.fastq.gz" 83898699 "51bdda20dda4e552e4fbdff6aa2d2a0a" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/002/ERR7195142/ERR7195142.fastq.gz" 2770837 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195142/I15071.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195142/I15071.bam.bai" | ||
"SAMEA10556453" "PRJEB47891" "ERR7195143" "I15071.MT" "I15071" "ERS8208438" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208438" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/003/ERR7195143/ERR7195143.fastq.gz" 198416 "3fb03c345ba6064aa853f3609e260908" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/003/ERR7195143/ERR7195143.fastq.gz" 10107 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195143/I15071.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195143/I15071.MT.bam.bai" | ||
"SAMEA10556446" "PRJEB47891" "ERR7195136" "I15047" "I15047" "ERS8208431" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208431" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195136/ERR7195136.fastq.gz" 264000464 "da78777b6aff2fdb6cec6a3be618dcb7" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195136/ERR7195136.fastq.gz" 8189015 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195136/I15047.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195136/I15047.bam.bai" | ||
"SAMEA10556450" "PRJEB47891" "ERR7195140" "I15049" "I15049" "ERS8208435" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208435" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/000/ERR7195140/ERR7195140.fastq.gz" 129131737 "ba9c46c580eb755adc598c698523dfc0" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/000/ERR7195140/ERR7195140.fastq.gz" 3881065 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195140/I15049.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195140/I15049.bam.bai" | ||
"SAMEA10556447" "PRJEB47891" "ERR7195137" "I15047.MT" "I15047" "ERS8208432" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208432" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195137/ERR7195137.fastq.gz" 581955 "eae85b702d2a7ab8955f9485ff3a6885" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195137/ERR7195137.fastq.gz" 31235 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195137/I15047.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195137/I15047.MT.bam.bai" | ||
"SAMEA10556449" "PRJEB47891" "ERR7195139" "I15048.MT" "I15048" "ERS8208434" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208434" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/009/ERR7195139/ERR7195139.fastq.gz" 1270588 "ca8c532ac405b9415b9cfcbd22f3ef9d" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/009/ERR7195139/ERR7195139.fastq.gz" 62681 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195139/I15048.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195139/I15048.MT.bam.bai" | ||
"SAMEA10556516" "PRJEB47891" "ERR7195206" "I16099" "I16099" "ERS8208501" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208501" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195206/ERR7195206.fastq.gz" 263845245 "a07b76f9d4619b296283e59d1a3b888e" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195206/ERR7195206.fastq.gz" 8379837 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195206/I16099.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195206/I16099.bam.bai" | ||
"SAMEA10556469" "PRJEB47891" "ERR7195159" "I15819.MT" "I15819" "ERS8208454" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208454" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/009/ERR7195159/ERR7195159.fastq.gz" 233827 "f29f003aec54a4f825c3e2f572e9c4a9" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/009/ERR7195159/ERR7195159.fastq.gz" 12443 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195159/I15819.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195159/I15819.MT.bam.bai" | ||
"SAMEA10556467" "PRJEB47891" "ERR7195157" "I15818.MT" "I15818" "ERS8208452" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208452" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195157/ERR7195157.fastq.gz" 1448272 "6dbb80b2856a0e60aeb3911d1ab73ab0" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195157/ERR7195157.fastq.gz" 72915 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195157/I15818.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195157/I15818.MT.bam.bai" | ||
"SAMEA10556471" "PRJEB47891" "ERR7195161" "I15821.MT" "I15821" "ERS8208456" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208456" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/001/ERR7195161/ERR7195161.fastq.gz" 5959 "287610ebd1dbc4847599e3c592637ec3" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/001/ERR7195161/ERR7195161.fastq.gz" 200 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195161/I15821.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195161/I15821.MT.bam.bai" | ||
"SAMEA10556466" "PRJEB47891" "ERR7195156" "I15818" "I15818" "ERS8208451" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208451" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195156/ERR7195156.fastq.gz" 326507936 "c09007cb4425843742de07d18c458d52" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195156/ERR7195156.fastq.gz" 10760686 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195156/I15818.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195156/I15818.bam.bai" | ||
"SAMEA10556476" "PRJEB47891" "ERR7195166" "I15825" "I15825" "ERS8208461" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208461" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195166/ERR7195166.fastq.gz" 158049239 "0fa8c9dca6002be7ef0bc17034d15ae9" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195166/ERR7195166.fastq.gz" 5578142 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195166/I15825.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195166/I15825.bam.bai" | ||
"SAMEA10556475" "PRJEB47891" "ERR7195165" "I15824.MT" "I15824" "ERS8208460" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208460" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/005/ERR7195165/ERR7195165.fastq.gz" 1350290 "a0d22f5546b83e6c1a0b71605b26f852" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/005/ERR7195165/ERR7195165.fastq.gz" 64314 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195165/I15824.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195165/I15824.MT.bam.bai" | ||
"SAMEA10556478" "PRJEB47891" "ERR7195168" "I15826" "I15826" "ERS8208463" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208463" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/008/ERR7195168/ERR7195168.fastq.gz" 81931038 "d96270dda60f454bb3ff289c05350999" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/008/ERR7195168/ERR7195168.fastq.gz" 3177628 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195168/I15826.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195168/I15826.bam.bai" | ||
"SAMEA10556479" "PRJEB47891" "ERR7195169" "I15826.MT" "I15826" "ERS8208464" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208464" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/009/ERR7195169/ERR7195169.fastq.gz" 303016 "278bed47346cf86e40b58d8683f6978e" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/009/ERR7195169/ERR7195169.fastq.gz" 17065 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195169/I15826.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195169/I15826.MT.bam.bai" | ||
"SAMEA10556456" "PRJEB47891" "ERR7195146" "I15643" "I15643" "ERS8208441" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208441" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195146/ERR7195146.fastq.gz" 225094788 "0f7b869de6b056f63ab14619c8df7dd3" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195146/ERR7195146.fastq.gz" 7250726 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195146/I15643.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195146/I15643.bam.bai" | ||
"SAMEA10556454" "PRJEB47891" "ERR7195144" "I15642" "I15642" "ERS8208439" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208439" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/004/ERR7195144/ERR7195144.fastq.gz" 101009164 "cc8b1cdcabc65b124e6ee3c1ea445669" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/004/ERR7195144/ERR7195144.fastq.gz" 3385990 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195144/I15642.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195144/I15642.bam.bai" | ||
"SAMEA10556457" "PRJEB47891" "ERR7195147" "I15643.MT" "I15643" "ERS8208442" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208442" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195147/ERR7195147.fastq.gz" 1074204 "684fdbdddb54ed566ce964e3be85664b" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195147/ERR7195147.fastq.gz" 55345 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195147/I15643.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195147/I15643.MT.bam.bai" | ||
"SAMEA10556455" "PRJEB47891" "ERR7195145" "I15642.MT" "I15642" "ERS8208440" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208440" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/005/ERR7195145/ERR7195145.fastq.gz" 377928 "f9ba82a9d43787c7c89891fa0a04cc21" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/005/ERR7195145/ERR7195145.fastq.gz" 20761 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195145/I15642.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195145/I15642.MT.bam.bai" | ||
"SAMEA10556737" "PRJEB47891" "ERR7195426" "I17146" "I17146" "ERS8208721" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208721" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195426/ERR7195426.fastq.gz" 198413526 "4152129d3db844c2d45683f7cac05527" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195426/ERR7195426.fastq.gz" 6489897 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195426/I17146.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195426/I17146.bam.bai" | ||
"SAMEA10556570" "PRJEB47891" "ERR7195260" "I16395" "I16395" "ERS8208555" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208555" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/000/ERR7195260/ERR7195260.fastq.gz" 37166054 "1abc5c021409a319d38267f43eec4f5f" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/000/ERR7195260/ERR7195260.fastq.gz" 1339727 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195260/I16395.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195260/I16395.bam.bai" | ||
"SAMEA10556569" "PRJEB47891" "ERR7195259" "I16394.MT" "I16394" "ERS8208554" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208554" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/009/ERR7195259/ERR7195259.fastq.gz" 50180 "4b9d3103ccb839b160949588ffec20f7" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/009/ERR7195259/ERR7195259.fastq.gz" 2235 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195259/I16394.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195259/I16394.MT.bam.bai" | ||
"SAMEA10556464" "PRJEB47891" "ERR7195154" "I15650" "I15650" "ERS8208449" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208449" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/004/ERR7195154/ERR7195154.fastq.gz" 229150373 "b6a8e9db45afc20ece7c0f7947282180" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/004/ERR7195154/ERR7195154.fastq.gz" 7282809 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195154/I15650.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195154/I15650.bam.bai" | ||
"SAMEA10556465" "PRJEB47891" "ERR7195155" "I15650.MT" "I15650" "ERS8208450" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208450" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/005/ERR7195155/ERR7195155.fastq.gz" 1400862 "d489bb81189288dd5176f165fe856b72" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/005/ERR7195155/ERR7195155.fastq.gz" 70228 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195155/I15650.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195155/I15650.MT.bam.bai" | ||
"SAMEA10556461" "PRJEB47891" "ERR7195151" "I15646.MT" "I15646" "ERS8208446" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208446" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/001/ERR7195151/ERR7195151.fastq.gz" 1037465 "233f23426d235168d3fe9426171ae4d7" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/001/ERR7195151/ERR7195151.fastq.gz" 46183 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195151/I15646.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195151/I15646.MT.bam.bai" | ||
"SAMEA10556463" "PRJEB47891" "ERR7195153" "I15648.MT" "I15648" "ERS8208448" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208448" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/003/ERR7195153/ERR7195153.fastq.gz" 1186006 "a2f04c5ee6650ef0bd0943502aa88af1" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/003/ERR7195153/ERR7195153.fastq.gz" 63912 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195153/I15648.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195153/I15648.MT.bam.bai" | ||
"SAMEA10556459" "PRJEB47891" "ERR7195149" "I15644.MT" "I15644" "ERS8208444" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208444" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/009/ERR7195149/ERR7195149.fastq.gz" 1672570 "713575499afa6cc5b9ead3289c2e0dfc" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/009/ERR7195149/ERR7195149.fastq.gz" 83270 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195149/I15644.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195149/I15644.MT.bam.bai" | ||
"SAMEA10556460" "PRJEB47891" "ERR7195150" "I15646" "I15646" "ERS8208445" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208445" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/000/ERR7195150/ERR7195150.fastq.gz" 223956785 "b9e22ee1e1b5a18a11aa536610ca1195" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/000/ERR7195150/ERR7195150.fastq.gz" 6397394 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195150/I15646.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195150/I15646.bam.bai" | ||
"SAMEA10556458" "PRJEB47891" "ERR7195148" "I15644" "I15644" "ERS8208443" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208443" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/008/ERR7195148/ERR7195148.fastq.gz" 261188898 "1303ca29e0feb962cedbe01ece9f979a" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/008/ERR7195148/ERR7195148.fastq.gz" 8051253 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195148/I15644.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195148/I15644.bam.bai" | ||
"SAMEA10556462" "PRJEB47891" "ERR7195152" "I15648" "I15648" "ERS8208447" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208447" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/002/ERR7195152/ERR7195152.fastq.gz" 152343117 "13c199363b66330b78bd1f55cd473d8e" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/002/ERR7195152/ERR7195152.fastq.gz" 4945218 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195152/I15648.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195152/I15648.bam.bai" | ||
"SAMEA10556480" "PRJEB47891" "ERR7195170" "I15950" "I15950" "ERS8208465" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208465" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/000/ERR7195170/ERR7195170.fastq.gz" 271200408 "e16da13da0b536a7df76345f04a7945c" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/000/ERR7195170/ERR7195170.fastq.gz" 8599743 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195170/I15950.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195170/I15950.bam.bai" | ||
"SAMEA10556477" "PRJEB47891" "ERR7195167" "I15825.MT" "I15825" "ERS8208462" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208462" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195167/ERR7195167.fastq.gz" 1366875 "fdf28bec874b24fa9574b66891fdcca2" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195167/ERR7195167.fastq.gz" 75489 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195167/I15825.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195167/I15825.MT.bam.bai" | ||
"SAMEA10556470" "PRJEB47891" "ERR7195160" "I15821" "I15821" "ERS8208455" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208455" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/000/ERR7195160/ERR7195160.fastq.gz" 15559676 "127d77ba49b8b7c3b3689f349b7a2c6d" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/000/ERR7195160/ERR7195160.fastq.gz" 765748 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195160/I15821.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195160/I15821.bam.bai" | ||
"SAMEA10556473" "PRJEB47891" "ERR7195163" "I15823.MT" "I15823" "ERS8208458" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208458" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/003/ERR7195163/ERR7195163.fastq.gz" 686534 "7f370e855af72fb6d31deeb7237bb3be" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/003/ERR7195163/ERR7195163.fastq.gz" 36412 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195163/I15823.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195163/I15823.MT.bam.bai" | ||
"SAMEA10556468" "PRJEB47891" "ERR7195158" "I15819" "I15819" "ERS8208453" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208453" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/008/ERR7195158/ERR7195158.fastq.gz" 84310556 "edf465349858ea68d563f28e15bd3379" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/008/ERR7195158/ERR7195158.fastq.gz" 2878062 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195158/I15819.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195158/I15819.bam.bai" | ||
"SAMEA10556688" "PRJEB47891" "ERR7195377" "I16599.MT" "I16599" "ERS8208672" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208672" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195377/ERR7195377.fastq.gz" 3288870 "5023f1828fedc6188f96ab232455293f" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195377/ERR7195377.fastq.gz" 165782 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195377/I16599.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195377/I16599.MT.bam.bai" | ||
"SAMEA10556474" "PRJEB47891" "ERR7195164" "I15824" "I15824" "ERS8208459" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208459" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/004/ERR7195164/ERR7195164.fastq.gz" 293783349 "5b9b26a06951caf34eb83988846517e0" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/004/ERR7195164/ERR7195164.fastq.gz" 8978099 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195164/I15824.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195164/I15824.bam.bai" | ||
"SAMEA10556472" "PRJEB47891" "ERR7195162" "I15823" "I15823" "ERS8208457" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208457" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/002/ERR7195162/ERR7195162.fastq.gz" 184289966 "eb93013dd2912f27bb76c77107e94d16" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/002/ERR7195162/ERR7195162.fastq.gz" 6321488 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195162/I15823.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195162/I15823.bam.bai" | ||
"SAMEA10556517" "PRJEB47891" "ERR7195207" "I16099.MT" "I16099" "ERS8208502" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208502" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195207/ERR7195207.fastq.gz" 1161043 "1306c543782fe2f3be63cb126c5bbd20" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195207/ERR7195207.fastq.gz" 58358 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195207/I16099.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195207/I16099.MT.bam.bai" | ||
"SAMEA10556547" "PRJEB47891" "ERR7195237" "I16271.MT" "I16271" "ERS8208532" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208532" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/007/ERR7195237/ERR7195237.fastq.gz" 1142943 "cf9e569b43bb1be1cd22dfd1cfcada5a" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/007/ERR7195237/ERR7195237.fastq.gz" 58806 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195237/I16271.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195237/I16271.MT.bam.bai" | ||
"SAMEA10556545" "PRJEB47891" "ERR7195235" "I16270.MT" "I16270" "ERS8208530" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208530" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/005/ERR7195235/ERR7195235.fastq.gz" 295319 "7e5d1332ae293760a8ccb3983527d57b" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/005/ERR7195235/ERR7195235.fastq.gz" 16459 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195235/I16270.MT.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195235/I16270.MT.bam.bai" | ||
"SAMEA10556546" "PRJEB47891" "ERR7195236" "I16271" "I16271" "ERS8208531" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208531" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/006/ERR7195236/ERR7195236.fastq.gz" 217592310 "458543373a612ed8a0b1a2e3f4452a6e" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/006/ERR7195236/ERR7195236.fastq.gz" 6905093 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195236/I16271.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195236/I16271.bam.bai" | ||
"SAMEA10556552" "PRJEB47891" "ERR7195242" "I16326" "I16326" "ERS8208537" 2021-10-29 2021-10-29 "Illumina HiSeq X" "SINGLE" "GENOMIC" "ILLUMINA" "ERS8208537" "OTHER" "fasp.sra.ebi.ac.uk:/vol1/fastq/ERR719/002/ERR7195242/ERR7195242.fastq.gz" 125308855 "b7318a43f3ae98cb0bff76fa1bbc6de4" "ftp.sra.ebi.ac.uk/vol1/fastq/ERR719/002/ERR7195242/ERR7195242.fastq.gz" 4198104 "ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195242/I16326.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR719/ERR7195242/I16326.bam.bai" | ||
"SAMEA105565 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 1626-1673 look strange. (Github seems to preview different ones 😕 )
sample_alias
looks like ena-SAMPLE-TAB-28-10-2021-18:52:59:284-23886
, and Poseidon_ID
field is empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to match 2020_AgranatTamir_LevantBA package with Poseidon data as it has completely different namings. I have opened this draft PR #116 for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: 2021_PattersonNature
I manually curated the SSF file for this package for my own uses, which you could copy over so we avoid doing work twice. I took the table you created and, using the paper's supplementary table as well as the AADR, filled in the library_built
and udg
columns. I also fixed the sample_alias
and poseidon_IDs
for the lines that I complained about above, using the BAM name as the IID.
I have added some markings to the sample_accession
(see this PR for explanations). These will be removed after we we discuss things in our next meeting, after which the SSF should be valid and can be copied over.
For future reference, you can find the updated file here. 😃
Right, I think the quotes were also mentioned by Clemens in some of the Janno Tables. @93Boy do you know how to take it from here? Or shell we help with some scripting to get this PR in order? |
This sed command should remove any double quotes from a given file.
You can edit the files in place with the following command instead (no new file created):
|
After a request for feedback here's what I see:
It's hard to systematically check the .ssf files. Good that we have the automatic validation to test everything in the end. For now I saw the following issues in some random files I inspected:
The first issue I observed in at least one more file. The automatic validation will highlight all files with these two issues. What it will not show are the cases where one or both of these columns are missing. But that is fine. Beyond that I saw the NA value |
@nevrome I have made all the changes except the |
Great! Almost all ToDos I saw above are resolved. What I still see:
@dhananjaya93: As this keeps coming up: Maybe you could try setting Unlike the other packages, 2021_Zegarac_SoutheasternEurope already has a new entry in the CHANGELOG file. Be careful with that, when you apply update to everything in the end. Regarding the |
OK, good job @93Boy @dhananjaya93 (which one is it now?) resolving most. I think we're happy to help resolving these last outstanding points, unless you want to resolve them yourself. Please let us know briefly whether you're working on it, otherwise I suggest that someone else resolves these quickly so we can finally merge this. |
…001 - I'm unsure about this one, even after checking the paper supplement
… this commit the overall validation should pass again (with a new version of trident)
I added a warning to trident to identify .ssf files with no
This renders these .ssf files a bit pointless, as already mentioned by @TCLamnidis in our discussion today. Not easy to fix, though, so maybe we should just remove them for this PR. Then @93Boy can work them in with another PR later. |
work-in-progress PR to bring #114 to an end
OK, I have looked briefly into the directory structure of this PR and some coarse look at some SSFs. What's the status regarding the automatic checking? I am happy to approve this PR in principle! |
Automatic checking should be done with a trident version from this PR: poseidon-framework/poseidon-hs#245 |
Adding ENA data tables to current packages