c_utilities

Various input/output and system utilities (note: used to be called "sysaduts" system administration utilites, but realised they were becomeign more general than that)

Programs

matread: reads a text which consists of a raw matrix of arbitrary size. floats assumed all round.
genread: parses a text file into lines and words.
dreadi: this is a pretty robust records-set-out-as-paragraphs datafile reader. It assumes that each record starts with a name (string) and then a series of integers, which can be any number, but of course must be the same for each record.

TODO

I'd like to deal with filelistings files. There would have only one word per line.

mprd3.c

a plink-alike stats reader
ran into issue of hashing on chr/pos strings, which is fine for
plink must be

srtfix and vttgo3

And similar sound file are for YT downloaded autosubs (not manual subs) srtfix (a delay program) actually interprets the timings .. not so sophicated mind you ebause they are all the same format in YT VTT so not strtok enecessary. vttgo also very simplistic, again the lines are highly regular so you can rey on indices. Note no special requirements for Cyrillic, this must be UTF-8 at work behind the scenes.

I find that subtitles are not delays by a constant factor, that it's more complicated than that. This program helps witht he beginning. Then they just start delaying wildly.

csvrdm.c

A program to handle the output of the DNA Methylation pipeline. The CSV is similar to the EPIC Annotation CSV, though there are important changes including renaming of one or two columns reordering and other things. The main idea is to take Illumina's multiply annotated CpG's (tables separated by semicolons) and make them have a row each for easier manipulation. For some reason only known to Illumina (and perhaps not even them) the multiple annotation often are identical. dupcrd2 is able to get rid of these. However sometimes they are not and in fact a CpG is assigned to more than one gene. Very unfortunately how to resolve them is hard to find out about, nobody can be drawn on the topic. So after csvrdm has separated them out into row, it is dupcrd2.R's job to resolve. In dec 23 I decided to follow my line of reason of "probability of biological impact score" rfbisc. by which a row get points for being in promoter, and in island etc.After running csvrdm on a file and running through dupcrd2.R, you shoulc actually get the same number of lines. the difference being that they are "resolved". Wrongly or rightly. the output of csvrd is often "_gened" postfixed. this is a convention, because output goes to STDOUT.

Note:

The pipeline outputs CSVs with quotes, this should be changed, they are not useful for csvrdm.
The column CpgGrp was GpgGrp for a while causing untold frustration.
need dupcrd.R to de-duplicate Cpg annotation

NOTE: I incorporated rfBISC (RF Biological Impact score) to help decide best Cpg for a certain gene

Big gotcha: Illumina's annotation has a row for each cpg (naturally) but som eof them are "gene-less" often because islands don't always occur beside genes and Illumina wnat to annotate the cpgs .. but they're only island-annotations.

So the sequence of programs is: csvrdm then csvfoc (to get 1 line per gene-annotated cpg) and then it goes back into the pipeline, NOTE a single CPG per gene has to be seelcted.

kmlrd.c

following from gpxrd.c these are rough and ready parsers of xml. Note I do not say "xml parsers" becuase they are most certainly not that. gpxrd is particularly unrobust, relying on garmin line numbers which are hardcoded. kmlrd is an improvement, first appearance of a certain characters is looked for. kmlrd gets stuck on distance from lat and long from a plane. Penzance was 16.5 km away on the surface but at 11km up in the air, it's 26.8? No it should be about 20. so the mydist() function is not great at all.

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
watch		watch
.gitignore		.gitignore
0418__.map		0418__.map
Makefile		Makefile
README.md		README.md
WT_T2_IP_cad1_rmdups3.isec		WT_T2_IP_cad1_rmdups3.isec
allf0.c		allf0.c
bedsumzr.c		bedsumzr.c
bgfiltf.c		bgfiltf.c
bglsta.c		bglsta.c
bglsta.h		bglsta.h
bglvset.c		bglvset.c
bglvset.h		bglvset.h
bgmergbl.c		bgmergbl.c
bgmergbl2.c		bgmergbl2.c
bgmergmc.c		bgmergmc.c
bgmergmcstealth.c		bgmergmcstealth.c
bgread.c		bgread.c
bgread0.c		bgread0.c
bgread0_.c		bgread0_.c
bgread0a.c		bgread0a.c
bgread2.c		bgread2.c
bgread3.c		bgread3.c
bgreadx.c		bgreadx.c
bgreadx0.c		bgreadx0.c
biodub_60823.kml		biodub_60823.kml
blard.c		blard.c
blard.h		blard.h
blard2.c		blard2.c
blard2.h		blard2.h
chagpt_gpxpa.c		chagpt_gpxpa.c
chagpt_kmlpa.c		chagpt_kmlpa.c
cleangrploop.c		cleangrploop.c
cleangrpo.c		cleangrpo.c
cmd2ext.c		cmd2ext.c
contabrd.c		contabrd.c
corrl0.c		corrl0.c
csvfoc.c		csvfoc.c
csvfoc.h		csvfoc.h
csvfocg.c		csvfocg.c
csvfocg.h		csvfocg.h
csvrd.c		csvrd.c
csvrd.h		csvrd.h
csvrde.c		csvrde.c
csvrde.h		csvrde.h
csvrdh.c		csvrdh.c
csvrdh.h		csvrdh.h
csvrdm.c		csvrdm.c
csvrdm.h		csvrdm.h
dcou.c		dcou.c
dcou2.c		dcou2.c
dcou3.c		dcou3.c
dcou4.c		dcou4.c
dreadn.c		dreadn.c
dreadn.h		dreadn.h
dupcrd.R		dupcrd.R
ec.bed		ec.bed
eids.txt		eids.txt
eqtest.ped		eqtest.ped
extcou.c		extcou.c
f3.c		f3.c
f3.h		f3.h
f4.c		f4.c
f4.h		f4.h
fard2.c		fard2.c
fard2.h		fard2.h
fcsvcmp.c		fcsvcmp.c
fcsvcmp.h		fcsvcmp.h
fcsvrd.c		fcsvrd.c
fcsvrd.h		fcsvrd.h
fcsvrd0		fcsvrd0
fcsvrd0.c		fcsvrd0.c
fcsvrd00		fcsvrd00
fcsvrd00.c		fcsvrd00.c
fintimrd.c		fintimrd.c
fintimrd.h		fintimrd.h
fintimtest.txt		fintimtest.txt
genes2.tsv		genes2.tsv
genrd.c		genrd.c
genrd.h		genrd.h
genread.c		genread.c
genread.h		genread.h
gpxrd0.c		gpxrd0.c
gpxrd0.h		gpxrd0.h
gpxrd1.c		gpxrd1.c
gpxrd1.h		gpxrd1.h
gread2.c		gread2.c
gread2.h		gread2.h
id2sy.tsv		id2sy.tsv
interestfea.txt		interestfea.txt
kmlrd.c		kmlrd.c
kmlrd.h		kmlrd.h
kmlrd0.c		kmlrd0.c
l2.t		l2.t
macsigf.c		macsigf.c
mapedstats.c		mapedstats.c
mapedstats.h		mapedstats.h
matread.c		matread.c
morel.c		morel.c
morel.h		morel.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

c_utilities

Programs

TODO

mprd3.c

srtfix and vttgo3

csvrdm.c

kmlrd.c

About

Releases

Packages

Languages

rafalcode/c_utilities

Folders and files

Latest commit

History

Repository files navigation

c_utilities

Programs

TODO

mprd3.c

srtfix and vttgo3

csvrdm.c

kmlrd.c

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages