make dev

johnkerl · Dec 23, 2023 · b984bd0 · b984bd0
1 parent 5a4d94a
commit b984bd0
Show file tree

Hide file tree

Showing 14 changed files with 32 additions and 20 deletions.
diff --git a/docs/src/kubectl-and-helm.md b/docs/src/kubectl-and-helm.md
@@ -152,7 +152,7 @@ $ helm list | mlr --itsv --ojson head -n 1
 ]
 </pre>
 
-A solution here is Miller's 
+A solution here is Miller's
 [clean-whitespace verb](reference-verbs.md#clean-whitespace):
 
 <pre class="pre-non-highlight-non-pair">

diff --git a/docs/src/manpage.md b/docs/src/manpage.md
@@ -988,6 +988,7 @@ MILLER(1)                                                            MILLER(1)
 
        Options:
        -f {a,b,c}    Field names for distinct count.
+       -x {a,b,c}    Field names to exclude for distinct count: use each record's others instead.
        -n            Show only the number of distinct values. Not compatible with -u.
        -o {name}     Field name for output count. Default "count".
                      Ignored with -u.
@@ -2154,6 +2155,7 @@ MILLER(1)                                                            MILLER(1)
 
        Options:
        -g {d,e,f}    Group-by-field names for uniq counts.
+       -x {a,b,c}    Field names to exclude for uniq: use each record's others instead.
        -c            Show repeat counts in addition to unique values.
        -n            Show only the number of distinct values.
        -o {name}     Field name for output count. Default "count".
@@ -3685,5 +3687,5 @@ MILLER(1)                                                            MILLER(1)
 
 
 
-                                  2023-12-19                         MILLER(1)
+                                  2023-12-23                         MILLER(1)
 </pre>
diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt
@@ -967,6 +967,7 @@ MILLER(1)                                                            MILLER(1)
 
        Options:
        -f {a,b,c}    Field names for distinct count.
+       -x {a,b,c}    Field names to exclude for distinct count: use each record's others instead.
        -n            Show only the number of distinct values. Not compatible with -u.
        -o {name}     Field name for output count. Default "count".
                      Ignored with -u.
@@ -2133,6 +2134,7 @@ MILLER(1)                                                            MILLER(1)
 
        Options:
        -g {d,e,f}    Group-by-field names for uniq counts.
+       -x {a,b,c}    Field names to exclude for uniq: use each record's others instead.
        -c            Show repeat counts in addition to unique values.
        -n            Show only the number of distinct values.
        -o {name}     Field name for output count. Default "count".
@@ -3664,4 +3666,4 @@ MILLER(1)                                                            MILLER(1)
 
 
 
-                                  2023-12-19                         MILLER(1)
+                                  2023-12-23                         MILLER(1)
diff --git a/docs/src/reference-dsl-time.md b/docs/src/reference-dsl-time.md
@@ -89,7 +89,7 @@ the [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) format.  This was the
 first (and initially only) human-readable date/time format supported by Miller
 going all the way back to Miller 1.0.0.
 
-You can get these from epoch-seconds using the 
+You can get these from epoch-seconds using the
 [sec2gmt](reference-dsl-builtin-functions.md#sec2gmt) DSL function.
 (Note that the terms _UTC_ and _GMT_ are used interchangeably in Miller.)
 We also have [sec2gmtdate](reference-dsl-builtin-functions.md#sec2gmtdate) DSL function.
@@ -200,7 +200,7 @@ mlr: TZ environment variable appears malformed: "This/Is/A/Typo"
 
 Note that for local times, Miller omits the `T` and the `Z` you see in GMT times.
 
-We also have the 
+We also have the
 [gmt2localtime](reference-dsl-builtin-functions.md#gmt2localtime) and
 [localtime2gmt](reference-dsl-builtin-functions.md#localtime2gmt) convenience functions:
 

diff --git a/docs/src/reference-main-regular-expressions.md b/docs/src/reference-main-regular-expressions.md
@@ -108,7 +108,7 @@ Regex captures of the form `\0` through `\9` are supported as follows:
 If you use `(...)` in your regular expression, then up to 9 matches are supported for the `=~`
 operator, and an arbitrary number of matches are supported for the `match` DSL function.
 
-* Before any match is done, `"\1"` etc. in a string evaluate to themselves. 
+* Before any match is done, `"\1"` etc. in a string evaluate to themselves.
 * After a successful match is done, `"\1"` etc. in a string evaluate to the matched substring.
 * After an unsuccessful match is done, `"\1"` etc. in a string evaluate to the empty string.
 * You can match against `null` to reset to the original state.

diff --git a/docs/src/reference-main-strings.md b/docs/src/reference-main-strings.md
@@ -197,4 +197,4 @@ See also [https://en.wikipedia.org/wiki/Escape_sequences_in_C](https://en.wikipe
 
 These replacements apply only to strings you key in for the DSL expressions for `filter` and `put`: that is, if you type `\t` in a string literal for a `filter`/`put` expression, it will be turned into a tab character. If you want a backslash followed by a `t`, then please type `\\t`.
 
-However, these replacements are done automatically only for string literals within DSL expressions -- they are not done automatically to fields within your data stream.  If you wish to make these replacements, you can do (for example) `mlr put '$field = gsub($field, "\\t", "\t")'`. If you need to make such a replacement for all fields in your data, you should probably use the system `sed` command instead. 
+However, these replacements are done automatically only for string literals within DSL expressions -- they are not done automatically to fields within your data stream.  If you wish to make these replacements, you can do (for example) `mlr put '$field = gsub($field, "\\t", "\t")'`. If you need to make such a replacement for all fields in your data, you should probably use the system `sed` command instead.
diff --git a/docs/src/reference-verbs.md b/docs/src/reference-verbs.md
@@ -596,6 +596,7 @@ Same as uniq -c.
 
 Options:
 -f {a,b,c}    Field names for distinct count.
+-x {a,b,c}    Field names to exclude for distinct count: use each record's others instead.
 -n            Show only the number of distinct values. Not compatible with -u.
 -o {name}     Field name for output count. Default "count".
               Ignored with -u.
@@ -4066,6 +4067,7 @@ count-distinct. For uniq, -f is a synonym for -g.
 
 Options:
 -g {d,e,f}    Group-by-field names for uniq counts.
+-x {a,b,c}    Field names to exclude for uniq: use each record's others instead.
 -c            Show repeat counts in addition to unique values.
 -n            Show only the number of distinct values.
 -o {name}     Field name for output count. Default "count".

diff --git a/docs/src/release-docs.md b/docs/src/release-docs.md
@@ -16,7 +16,7 @@ Quick links:
 </div>
 # Documents for releases
 
-If your `mlr version` says something like `mlr 6.0.0-dev`, with the `-dev` suffix, you're likely building from source, or you've obtained a recent artifact from GitHub Actions -- 
+If your `mlr version` says something like `mlr 6.0.0-dev`, with the `-dev` suffix, you're likely building from source, or you've obtained a recent artifact from GitHub Actions --
 the page [https://miller.readthedocs.io/en/main](https://miller.readthedocs.io/en/main) contains information for the latest contributions to the [Miller repository](https://github.com/johnkerl/miller).
 
 If your `mlr version` says something like `Miller v5.10.2` or `mlr 6.0.0`, without the `-dev` suffix, you're likely using a Miller executable from a package manager -- please see below for the documentation for Miller as of the release you're using.

diff --git a/docs/src/shapes-of-data.md b/docs/src/shapes-of-data.md
@@ -33,7 +33,7 @@ Also try `od -xcv` and/or `cat -e` on your file to check for non-printable chara
 Use the `file` command to see if there are CR/LF terminators (in this case, there are not):
 
 <pre class="pre-highlight-in-pair">
-<b>file data/colours.csv </b>
+<b>file data/colours.csv</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 data/colours.csv: Unicode text, UTF-8 text
@@ -42,7 +42,7 @@ data/colours.csv: Unicode text, UTF-8 text
 Look at the file to find names of fields:
 
 <pre class="pre-highlight-in-pair">
-<b>cat data/colours.csv </b>
+<b>cat data/colours.csv</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR
@@ -53,13 +53,13 @@ masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;S
 Extract a few fields:
 
 <pre class="pre-highlight-non-pair">
-<b>mlr --csv cut -f KEY,PL,TO data/colours.csv </b>
+<b>mlr --csv cut -f KEY,PL,TO data/colours.csv</b>
 </pre>
 
 Use XTAB output format to get a sharper picture of where records/fields are being split:
 
 <pre class="pre-highlight-in-pair">
-<b>mlr --icsv --oxtab cat data/colours.csv </b>
+<b>mlr --icsv --oxtab cat data/colours.csv</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
@@ -70,7 +70,7 @@ KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Mu
 Using XTAB output format makes it clearer that `KEY;DE;...;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here.  Use XTAB again with different field separator (`--fs semicolon`):
 
 <pre class="pre-highlight-in-pair">
-<b>mlr --icsv --ifs semicolon --oxtab cat data/colours.csv </b>
+<b>mlr --icsv --ifs semicolon --oxtab cat data/colours.csv</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 KEY masterdata_colourcode_1
@@ -101,7 +101,7 @@ TR  Siyah
 Using the new field-separator, retry the cut:
 
 <pre class="pre-highlight-in-pair">
-<b>mlr --csv --fs semicolon cut -f KEY,PL,TO data/colours.csv </b>
+<b>mlr --csv --fs semicolon cut -f KEY,PL,TO data/colours.csv</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 KEY;PL;TO

diff --git a/docs/src/statistics-examples.md b/docs/src/statistics-examples.md
@@ -23,7 +23,7 @@ For one or more specified field names, simply compute p25 and p75, then write th
 <pre class="pre-highlight-in-pair">
 <b>mlr --oxtab stats1 -f x -a p25,p75 \</b>
 <b>    then put '$x_iqr = $x_p75 - $x_p25' \</b>
-<b>    data/medium </b>
+<b>    data/medium</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 x_p25 0.24667037823231752
@@ -40,7 +40,7 @@ For wildcarded field names, first compute p25 and p75, then loop over field name
 <b>        $["\1_iqr"] = $["\1_p75"] - $["\1_p25"]</b>
 <b>      }</b>
 <b>    }' \</b>
-<b>    data/medium </b>
+<b>    data/medium</b>
 </pre>
 <pre class="pre-non-highlight-in-pair">
 i_p25 2501

diff --git a/docs/src/why.md b/docs/src/why.md
@@ -48,7 +48,7 @@ Eighth thing: It's an **awful lot of fun to write**. In my experience I didn't f
 
 Miller is command-line-only by design. People who want a graphical user interface won't find it here.  This is in part (a) accommodating my personal preferences, and in part (b) guided by my experience/belief that the command line is very expressive. Steeper learning curve than a GUI, yes. I consider that price worth paying for the tool-niche which Miller occupies.
 
-Another tradeoff: supporting lists of records keeps me supporting only what can be expressed in *all* of those formats. For example, `[1,2,3,4,5]` is valid but unmillerable JSON: the list elements are not records.  So Miller can't (and won't) handle arbitrary JSON -- because Miller only handles tabular data which can be expressed in a variety of formats. 
+Another tradeoff: supporting lists of records keeps me supporting only what can be expressed in *all* of those formats. For example, `[1,2,3,4,5]` is valid but unmillerable JSON: the list elements are not records.  So Miller can't (and won't) handle arbitrary JSON -- because Miller only handles tabular data which can be expressed in a variety of formats.
 
 A third tradeoff is doing build-from-scratch in a low-level language. It'd be quicker to write (but slower to run) if written in a high-level language. If Miller were written in Python, it would be implemented in significantly fewer lines of code than its current Go implementation. The DSL would just be an `eval` of Python code. And it would run slower, but maybe not enough slower to be a problem for most folks. Later I found out about the [rows](https://github.com/turicas/rows) tool -- if you find Miller useful, you should check out `rows` as well.
 

diff --git a/man/manpage.txt b/man/manpage.txt
@@ -967,6 +967,7 @@ MILLER(1)                                                            MILLER(1)
 
        Options:
        -f {a,b,c}    Field names for distinct count.
+       -x {a,b,c}    Field names to exclude for distinct count: use each record's others instead.
        -n            Show only the number of distinct values. Not compatible with -u.
        -o {name}     Field name for output count. Default "count".
                      Ignored with -u.
@@ -2133,6 +2134,7 @@ MILLER(1)                                                            MILLER(1)
 
        Options:
        -g {d,e,f}    Group-by-field names for uniq counts.
+       -x {a,b,c}    Field names to exclude for uniq: use each record's others instead.
        -c            Show repeat counts in addition to unique values.
        -n            Show only the number of distinct values.
        -o {name}     Field name for output count. Default "count".
@@ -3664,4 +3666,4 @@ MILLER(1)                                                            MILLER(1)
 
 
 
-                                  2023-12-19                         MILLER(1)
+                                  2023-12-23                         MILLER(1)
diff --git a/man/mlr.1 b/man/mlr.1
@@ -2,12 +2,12 @@
 .\"     Title: mlr
 .\"    Author: [see the "AUTHOR" section]
 .\" Generator: ./mkman.rb
-.\"      Date: 2023-12-19
+.\"      Date: 2023-12-23
 .\"    Manual: \ \&
 .\"    Source: \ \&
 .\"  Language: English
 .\"
-.TH "MILLER" "1" "2023-12-19" "\ \&" "\ \&"
+.TH "MILLER" "1" "2023-12-23" "\ \&" "\ \&"
 .\" -----------------------------------------------------------------
 .\" * Portability definitions
 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1186,6 +1186,7 @@ Same as uniq -c.
 
 Options:
 -f {a,b,c}    Field names for distinct count.
+-x {a,b,c}    Field names to exclude for distinct count: use each record's others instead.
 -n            Show only the number of distinct values. Not compatible with -u.
 -o {name}     Field name for output count. Default "count".
               Ignored with -u.
@@ -2700,6 +2701,7 @@ count-distinct. For uniq, -f is a synonym for -g.
 
 Options:
 -g {d,e,f}    Group-by-field names for uniq counts.
+-x {a,b,c}    Field names to exclude for uniq: use each record's others instead.
 -c            Show repeat counts in addition to unique values.
 -n            Show only the number of distinct values.
 -o {name}     Field name for output count. Default "count".

diff --git a/test/cases/cli-help/0001/expout b/test/cases/cli-help/0001/expout
@@ -96,6 +96,7 @@ Same as uniq -c.
 
 Options:
 -f {a,b,c}    Field names for distinct count.
+-x {a,b,c}    Field names to exclude for distinct count: use each record's others instead.
 -n            Show only the number of distinct values. Not compatible with -u.
 -o {name}     Field name for output count. Default "count".
               Ignored with -u.
@@ -1320,6 +1321,7 @@ count-distinct. For uniq, -f is a synonym for -g.
 
 Options:
 -g {d,e,f}    Group-by-field names for uniq counts.
+-x {a,b,c}    Field names to exclude for uniq: use each record's others instead.
 -c            Show repeat counts in addition to unique values.
 -n            Show only the number of distinct values.
 -o {name}     Field name for output count. Default "count".
Original file line number	Diff line number	Diff line change
Expand Up		@@ -197,4 +197,4 @@ See also [https://en.wikipedia.org/wiki/Escape_sequences_in_C](https://en.wikipe

		These replacements apply only to strings you key in for the DSL expressions for `filter` and `put`: that is, if you type `\t` in a string literal for a `filter`/`put` expression, it will be turned into a tab character. If you want a backslash followed by a `t`, then please type `\\t`.

		However, these replacements are done automatically only for string literals within DSL expressions -- they are not done automatically to fields within your data stream. If you wish to make these replacements, you can do (for example) `mlr put '$field = gsub($field, "\\t", "\t")'`. If you need to make such a replacement for all fields in your data, you should probably use the system `sed` command instead.
		However, these replacements are done automatically only for string literals within DSL expressions -- they are not done automatically to fields within your data stream. If you wish to make these replacements, you can do (for example) `mlr put '$field = gsub($field, "\\t", "\t")'`. If you need to make such a replacement for all fields in your data, you should probably use the system `sed` command instead.