-
-
Notifications
You must be signed in to change notification settings - Fork 163
Structured Data in Oil
andychu edited this page Sep 14, 2020
·
38 revisions
Oil will parse text so you don't have to!
- What is a Data Frame? (In Python, R, and SQL) (blog)
-
Git Log in HTML (blog)
- After that I wrote Structured Data Over Pipes
- And Unix Tools
- Oil and the R Language
- Other Oil Use Cases:
- release.sh generates releases.html
- benchmarks/*.{sh,R} generates osh-parser report, etc.
- lobste.rs comment on Oil philosophy for structured data
- Done: QSN: A Familiar String Interchange Format
- TSV2 Proposal (renamed QTSV)
- each Keyword in Oil -- augments xargs, uses TSV2
- Explicit Framing Protocol Proposal
All of these are obviously supported because Oil is a shell! But there are advantages to a built-in expression language. (It's deferred for 2020.)
-
CSV / TSV / etc.
-
https://github.com/johnkerl/miller -- Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
- GNU flex lexer for its language: https://github.com/johnkerl/miller/blob/master/c/parsing/mlr_dsl_lexer.l
- Lemon parser (like sqlite!): https://github.com/johnkerl/miller/blob/master/c/parsing/mlr_dsl_parse.y
- https://csvkit.readthedocs.io/en/1.0.3/
- https://github.com/BurntSushi/xsv -- A fast CSV command line toolkit written in Rust
- https://github.com/sustrik/uxy
-
https://github.com/johnkerl/miller -- Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
JSON
- https://github.com/benbernard/RecordStream -- commandline tools for slicing and dicing JSON records
- https://github.com/aanastasiou/pyjunix
- http://jsonlines.org/ and http://ndjson.org/ -- what's the difference?
- https://github.com/kellyjonbrazil/jc from https://news.ycombinator.com/item?id=22366638
-
https://github.com/tomnomnom/gron -- convert JSON to
grep
able andsed
able form, and back. - query-json thread with many links -- For everyone pining for a Jq with a different syntax: I have a bunch of links to alternatives collected, you might want to try some of them (some may be for different things than JSON)
-
HTML
-
Other Text
-
Binary
- https://relational-pipes.globalcode.info/v_0/index.xhtml -- binary format
-
Common Output Format for unix-like tools: https://github.com/aniou/cof/wiki/Draft (draft rather than code)
- netstat example, etc.
Miller:
$ mlr --icsv --opprint --barred \
put '$tiv_delta = $tiv_2012 - $tiv_2011; unset $tiv_2011, $tiv_2012' \
then sort -nr tiv_delta flins.csv
It has a nice expression language. You compute new columns as a function of older ones, then drop them.