-
-
Notifications
You must be signed in to change notification settings - Fork 163
Structured Data in Oil
andychu edited this page Mar 24, 2022
·
38 revisions
Oil will parse text so you don't have to!
- What is a Data Frame? (In Python, R, and SQL) (blog)
-
Git Log in HTML (blog)
- After that I wrote Structured Data Over Pipes
- And Unix Tools
- Oil and the R Language
- Other Oil Use Cases:
- release.sh generates releases.html
- benchmarks/*.{sh,R} generates osh-parser report, etc.
- lobste.rs comment on Oil philosophy for structured data (2019)
- Hacker News comment on the philosophy for structured data (September 2020)
- Oil's Table Type vs. Relations (lobste.rs, July 2021)
- QSN: A Familiar String Interchange Format -- spec done, encoder done, still need decoder
-
QTT -- An extension to TSV that builds on top of QSN.
- older: TSV2 Proposal
- each Keyword in Oil -- augments xargs, uses TSV2
- Explicit Framing Protocol Proposal
All of these are obviously supported because Oil is a shell! But there are advantages to a built-in expression language. (It's deferred for 2020.)
dbohdan/structured-text-tools - A list of command line tools for manipulating structured text data
-
CSV / TSV / etc.
-
https://github.com/johnkerl/miller -- Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
- GNU flex lexer for its language: https://github.com/johnkerl/miller/blob/master/c/parsing/mlr_dsl_lexer.l
- Lemon parser (like sqlite!): https://github.com/johnkerl/miller/blob/master/c/parsing/mlr_dsl_parse.y
- https://csvkit.readthedocs.io/en/1.0.3/
- https://github.com/BurntSushi/xsv -- A fast CSV command line toolkit written in Rust
- https://github.com/sustrik/uxy -- parses output of ls and ps into a table-like format, written in Python
-
https://github.com/johnkerl/miller -- Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
JSON
- https://github.com/benbernard/RecordStream -- commandline tools for slicing and dicing JSON records
- https://github.com/aanastasiou/pyjunix
- http://jsonlines.org/ and http://ndjson.org/ -- what's the difference?
-
https://github.com/kellyjonbrazil/jc from https://news.ycombinator.com/item?id=22366638
- parses output of netstat etc. and outputs JSON. Similar to uxy above.
- Bringing the Unix Philosophy into the 21st Century (2019, discussed 8/2021)
-
https://github.com/tomnomnom/gron -- convert JSON to
grep
able andsed
able form, and back. - query-json thread with many links -- For everyone pining for a Jq with a different syntax: I have a bunch of links to alternatives collected, you might want to try some of them (some may be for different things than JSON)
- https://github.com/tomnomnom/gron -- Make JSON Greppable
-
HTML
-
Other Text
-
Binary
- https://relational-pipes.globalcode.info/v_0/index.xhtml -- binary format
-
Common Output Format for unix-like tools: https://github.com/aniou/cof/wiki/Draft (draft rather than code)
- netstat example, etc.
-
More links/projects: https://lobste.rs/s/zvallq/pretty_csv_viewing_on_command_line
- VisiData looks like an interesting UI.
-
glib GVariants: https://blogs.gnome.org/alexl/2012/08/10/rethinking-the-shell-pipeline/
Miller:
$ mlr --icsv --opprint --barred \
put '$tiv_delta = $tiv_2012 - $tiv_2011; unset $tiv_2011, $tiv_2012' \
then sort -nr tiv_delta flins.csv
It has a nice expression language. You compute new columns as a function of older ones, then drop them.