dotfiles

Bash (or zsh!) functions for data science.
Check dpkg.md for a list of apt packages that are awesome. On a Debian based system (like Ubuntu!) you can

sudo apt-get update && sudo apt-get install $( cat dpkg.md | sed 's:*.*\|#.*::g' | grep -vE "^\s" | tr '\n' ' ' )

function	purpose	example	prerequisite
latest	print the name of the most recently modified file in the current directory	latest
latest-exec	print the path to the most recently modified n executable files under your $PATH. defaults to an n of 1	latest-exec 3
listold	list the oldest files over n MB in current directory	listold 100	parallel,mawk
maybedups	prints TSV of possible file duplicates of largest n files under current directory. NB: files might not be duplicates but it's fast	maybedups 10 \| csvlook -t \| vim -	tawk
clipboard	pipe text to clipboard	cat foo \| clipboard	xclip
pdf_subset	take a page range from a PDF	pdf_subset in.pdf 23-41 out.pdf	pdftk
ngrams	get ngrams of length n from a column, treating records as documents	cat foo.tsv \| cut -f3 \| ngrams 2
plotbars	use ggplot to make PNG bar graph of a TSV. high res an option!	cat foo.tsv \| tawk '{print $2}' \| sortfreq \| sortkh "-k2 n" \| plotbars year count "title" 20 \| feh -	ggplot2,Rio
dumbplot	use GNUplot to graph one or two numeric fields in the terminal. removes header if found. assumes should graph points but can graphs lines. inspired by jeroenjanssens	cat foo.tsv \| cut -f3,4 \| dumbplot OR cat foo.tsv \| cut -f4 \| dumbplot lines	gnuplot
table2tsv	convert any Gnumeric compatible table to TSV	cat foo.csv \| table2tsv	Gnumeric
table2csv	convert any CSVKit compatible table to CSV	cat foo.tsv \| table2csv	csvkit
tsv2githubmd	print a TSV as a GitHub flavored markdown table	cat foo.tsv \|tsv2githubmd >> README.md
tsv2redis	get redis hashes from each record of a TSV	cat foo.tsv \| tsv2redis && echo "hgetall 1"	redis-server,redis-tools,GNU moreutils,tawk,trim,mawk
joinmany_csv	join an arbitrary number of TSVs on a given (identically named) field	joinmany_csv "a b.csv 1.csv d e f" "project id" "full outer join"	txt2pgsql.pl,postgreSQL
joinmany_csv	join an arbitrary number of CSVs on a given (identically named) field. Note that this cannot use OUTER or RIGHT joins b/c it relies on SQLite	joinmany_csv "/tmp/a /tmp/b /tmp/c /tmp/d /tmp/e /tmp/f" project_id inner tabs	csv2sqlite.py,SQLite3
joinmany_psql	join an arbitrary number of postgres tables on a given (identically named) field	joinmany_psql "a b c d e f" project_id "full outer join" db_name	postgreSQL
psql_listcols	for a PostgreSQL DB, print a TSV of all table names and their corresponding field names	psql_listcols my_db	parallel,mawk
samplekh	get n random records and keep your header line on top	cat foo.tsv \| samplekh 3000
sortkh	sort a TSV using UNIX sort options, keeping header in place	cat foo.tsv \| sortkh "-k2 -n"
sortfreq	print counts of unique values descending. keeps header in place	cat foo.tsv \| tawk '{ print $4 }' \| sortfreq
col_sort	use UNIX sort flags (eg -n or -d) to reorder TSV fields	col_sort -n foo.tsv \| sponge foo.tsv	mawk,csvkit,table2tsv
col_extra	print records that have content beyond expected number of fields for delimited text	cat foo.tsv \| col_extra 19	mawk
col_swap	switch the position of two columns in delimited text	cat foo.tsv \| col_swap 3 4 \| sponge foo.tsv	mawk
funky_chars	return the count for each unique non-alpha non-digit character in the input	cat foo.tsv \| tawk '{ print $4 }' \| funky_chars
trim	remove leading and trailing whitespace	cat foo \| trim
round	round numeric field to the nearest n digits	cat foo \| round 2
sumawk	sum a single numeric field	cat foo.tsv \| tawk '{ print $2 }' \| sumawk
uniqvals	given a TSV, return a TSV with the frequency of all unique values shown for each field	cat foo.tsv \| uniqvals \| csvlook -t \| vim -	mawk
unique	given a single column, return the first appearance of each unique value	cat foo.tsv \| c 1 \| unique	mawk
mkid	given a TSV, returns input with an integer ID field at the front	cat foo.tsv \| mkid
tawk	make awk take in TSV and output TSV	cat foo.tsv \| tawk '{ print $4,$5 }'	mawk
pawk	make awk take in pipe separated and output pipe separated	cat foo.txt \| pawk '{ print $4,$5 }'	mawk
cawk	make awk take in CSV and output CSV. NB: you usually want to use csvkit's csvcut for CSV. delimiter collision is the norm	cat foo.csv \| cawk '{ print $4,$5 }'	mawk
theader	print numbered TSV header	cat foo.tsv \| theader
pheader	print numbered pipe delimited txt header	cat foo.txt \| pheader
cheader	print numbered CSV header	cat foo.csv \| cheader
awkcols	format a sequence of numbers as awk columns	cols=$(seq 15 1560 \| awkcols ); cat foo.tsv \| tawk "{ print $cols}"
find_ext	find all files under current directory with a given extension	find_ext csv
url_encode	URL encode text	url_encode 'A&P'	URI::Escape
url_decode	decode URL encoded text	url_decode 'A%26P'	URI::Escape
html_encode	HTML encode text	echo '&' \| html_encode	HTML::Entities
html_decode	decode HTML encoded text	echo '&' \| html_decode	HTML::Entities
libretsv	force LibreOffice to open TSV as a table	libretsv foo.tsv	LibreOffice
parallel	make parallel behave like GNU parallel every time	cat foo \| parallel 'echo {}'	parallel
netpiglets	show processes using ports - like nethogs but smaller!	netpiglets \| xargs -I '{}' pkill {}
c	quick cut for TSV fields	cat foo.tsv \| c 8,9
k	quick open KeePass on commandline, hard codes filepath	k	kpcli
s	quick screen lock	s	slock
p	quick git add, commit, push	p	expects push over SSH
sr	quick open text web browser, uses surfraw elvi, otherwise assumes search with duckduckgo.com	sr george washington	surfraw
v	quick open graphical web browser vimb. assumes duckduckgo.com if not given a website	v george washinton	vimb, surfraw
lsoctal	list file permissions in octal format	lsoctal foo.txt

~/.zshrc uses oh-my-zsh.
add a file called ~/.i3/foo.png to get a fullscreen i3wm background using feh, if you change ~/.i3/config
- existing png under ~/.i3 from radical cartography
to use surfraw's "sr" alias with this ~/.zshrc you will need to

sudo rm /usr/bin/sr

TODO

join arbitrary number of tables on same combination of columns, not just a single column
use uniq -d
use literate-programming
consider using LC_ALL=C or LANG=C (eg sort, grep)
use sort -s
prefer uconv to iconv
sgrep can be much faster than grep

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
.canto		.canto
.config		.config
.i3		.i3
.tmuxinator		.tmuxinator
.vim/colors		.vim/colors
.bash_profile		.bash_profile
.gitignore		.gitignore
.gitignore_global		.gitignore_global
.inputrc		.inputrc
.profile		.profile
.tmux.conf		.tmux.conf
.vimrc		.vimrc
.xinitrc		.xinitrc
.zshrc		.zshrc
README.md		README.md
dpkg.md		dpkg.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dotfiles

TODO

About

Releases

Packages

Languages

albert-decatur/dotfiles

Folders and files

Latest commit

History

Repository files navigation

dotfiles

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages