forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTranslation.Rmd
741 lines (561 loc) · 25.5 KB
/
Translation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
# Translating R code {#translation}
```{r, include = FALSE}
source("common.R")
library(dbplyr) # to supress startup messages below
```
## Introduction
The combination of first-class environments, lexical scoping, and metaprogramming gives us a powerful toolkit for translating R code into other languages. One fully-fledged example of this idea is dbplyr, which powers the database backends for dplyr, allowing you to express data manipulation in R and automatically translate it into SQL. You can see the key idea in `translate_sql()` which takes R code and returns the equivalent SQL:
```{r}
library(dbplyr)
translate_sql(x ^ 2)
translate_sql(x < 5 & !is.na(x))
translate_sql(!first %in% c("John", "Roger", "Robert"))
translate_sql(select == 7)
```
Translating R to SQL is complex because of the many idiosyncrasies of SQL dialects, so here I'll develop two simple, but useful, domain specific languages (DSL): one to generate HTML, and the other to generate mathematical equations in LaTeX.
If you're interested in learning more about domain specific languages in general, I highly recommend _Domain Specific Languages_ [@dsls]. It discusses many options for creating a DSL and provides many examples of different languages.
### Outline {-}
* Section \@ref(html) creates a DSL for generating HTML, using quasiquotation
and purrr to generate a function for each HTML tag, then tidy evaluation to
easily access them.
* Section \@ref(latex) transforms mathematically R code into its LaTeX
equivalent using a combination of tidy evaluation and expression walking.
### Prerequisites {-}
This chapter pulls together many techniques discussed elsewhere in the book. In particular, you'll need to understand environments, expressions, tidy evaluation, and a little functional programming and S3. We'll use [rlang](https://rlang.r-lib.org) for metaprogramming tools, and [purrr](https://purrr.tidyverse.org) for functional programming.
```{r setup, message = FALSE}
library(rlang)
library(purrr)
```
## HTML {#html}
\index{HTML}
HTML (HyperText Markup Language) underlies the majority of the web. It's a special case of SGML (Standard Generalised Markup Language), and it's similar but not identical to XML (eXtensible Markup Language). HTML looks like this:
```html
<body>
<h1 id='first'>A heading</h1>
<p>Some text & <b>some bold text.</b></p>
<img src='myimg.png' width='100' height='100' />
</body>
```
Even if you've never looked at HTML before, you can still see that the key component of its coding structure is tags, which look like `<tag></tag>` or `<tag />`. Tags can be nested within other tags and intermingled with text. There are over 100 HTML tags, but in this chapter we'll focus on just a handful:
* `<body>` is the top-level tag that contains all content.
* `<h1>` defines a top level heading.
* `<p>` defines a paragraph.
* `<b>` emboldens text.
* `<img>` embeds an image.
Tags can have named __attributes__ which look like `<tag name1='value1' name2='value2'></tag>`. Two of the most important attributes are `id` and `class`, which are used in conjunction with CSS (Cascading Style Sheets) to control the visual appearance of the page.
__Void tags__, like `<img>`, don't have any children, and are written `<img />`, not `<img></img>`. Since they have no content, attributes are more important, and `img` has three that are used with almost every image: `src` (where the image lives), `width`, and `height`.
Because `<` and `>` have special meanings in HTML, you can't write them directly. Instead you have to use the HTML __escapes__: `>` and `<`. And since those escapes use `&`, if you want a literal ampersand you have to escape it as `&`.
### Goal
Our goal is to make it easy to generate HTML from R. To give a concrete example, we want to generate the following HTML:
```html
<body>
<h1 id='first'>A heading</h1>
<p>Some text & <b>some bold text.</b></p>
<img src='myimg.png' width='100' height='100' />
</body>
```
Using the following code that matches the structure of the HTML as closely as possible:
```{r, eval = FALSE}
with_html(
body(
h1("A heading", id = "first"),
p("Some text &", b("some bold text.")),
img(src = "myimg.png", width = 100, height = 100)
)
)
```
This DSL has the following three properties:
* The nesting of function calls matches the nesting of tags.
* Unnamed arguments become the content of the tag, and named arguments
become their attributes.
* `&` and other special characters are automatically escaped.
### Escaping
\index{escaping}
Escaping is so fundamental to translation that it'll be our first topic. There are two related challenges:
* In user input, we need to automatically escape `&`, `<` and `>`.
* At the same time we need to make sure that the `&`, `<` and `>` we generate
are not double-escaped (i.e. that we don't accidentally generate `&amp;`, `&lt;` and `&gt;`).
The easiest way to do this is to create an S3 class (Section \@ref(s3-classes)) that distinguishes between regular text (that needs escaping) and HTML (that doesn't).
```{r escape}
html <- function(x) structure(x, class = "advr_html")
print.advr_html <- function(x, ...) {
out <- paste0("<HTML> ", x)
cat(paste(strwrap(out), collapse = "\n"), "\n", sep = "")
}
```
We then write an escape generic. It has two important methods:
* `escape.character()` takes a regular character vector and returns an HTML
vector with special characters (`&`, `<`, `>`) escaped.
* `escape.advr_html()` leaves already escaped HTML alone.
```{r}
escape <- function(x) UseMethod("escape")
escape.character <- function(x) {
x <- gsub("&", "&", x)
x <- gsub("<", "<", x)
x <- gsub(">", ">", x)
html(x)
}
escape.advr_html <- function(x) x
```
Now we check that it works
```{r}
escape("This is some text.")
escape("x > 1 & y < 2")
# Double escaping is not a problem
escape(escape("This is some text. 1 > 2"))
# And text we know is HTML doesn't get escaped.
escape(html("<hr />"))
```
Conveniently, this also allows a user to opt out of our escaping if they know the content is already escaped.
### Basic tag functions
Next, we'll write a one-tag function by hand, then figure out how to generalise it so we can generate a function for every tag with code.
Let's start with `<p>`. HTML tags can have both attributes (e.g., id or class) and children (like `<b>` or `<i>`). We need some way of separating these in the function call. Given that attributes are named and children are not, it seems natural to use named and unnamed arguments for them respectively. For example, a call to `p()` might look like:
```{r, eval = FALSE}
p("Some text. ", b(i("some bold italic text")), class = "mypara")
```
We could list all the possible attributes of the `<p>` tag in the function definition, but that's hard because there are many attributes, and because it's possible to use [custom attributes](http://html5doctor.com/html5-custom-data-attributes/). Instead, we'll use `...` and separate the components based on whether or not they are named. With this in mind, we create a helper function that wraps around `rlang::list2()` (Section \@ref(tidy-dots)) and returns named and unnamed components separately:
```{r named}
dots_partition <- function(...) {
dots <- list2(...)
if (is.null(names(dots))) {
is_named <- rep(FALSE, length(dots))
} else {
is_named <- names(dots) != ""
}
list(
named = dots[is_named],
unnamed = dots[!is_named]
)
}
str(dots_partition(a = 1, 2, b = 3, 4))
```
We can now create our `p()` function. Notice that there's one new function here: `html_attributes()`. It takes a named list and returns the HTML attribute specification as a string. It's a little complicated (in part, because it deals with some idiosyncrasies of HTML that I haven't mentioned here), but it's not that important and doesn't introduce any new programming ideas, so I won't discuss it in detail. You can find the [source online](https://github.com/hadley/adv-r/blob/master/dsl-html-attributes.r) if you want to work through it yourself.
<!-- GVW: possible/useful to show a very simple version of `html_attributes`, then point out one or two cases for which it fails, then tell them to read the source? I'm always nervous when someone tells me "you don't need to worry about the details of this" -->
```{r p}
source("dsl-html-attributes.r")
p <- function(...) {
dots <- dots_partition(...)
attribs <- html_attributes(dots$named)
children <- map_chr(dots$unnamed, escape)
html(paste0(
"<p", attribs, ">",
paste(children, collapse = ""),
"</p>"
))
}
p("Some text")
p("Some text", id = "myid")
p("Some text", class = "important", `data-value` = 10)
```
### Tag functions
It's straightforward to adapt `p()` to other tags: we just need to replace `"p"` with the name of the tag. One elegant way to do that is to create a function with `rlang::new_function()` (Section \@ref(new-function)), using unquoting and `paste0()` to generate the starting and ending tags.
```{r}
tag <- function(tag) {
new_function(
exprs(... = ),
expr({
dots <- dots_partition(...)
attribs <- html_attributes(dots$named)
children <- map_chr(dots$unnamed, escape)
html(paste0(
!!paste0("<", tag), attribs, ">",
paste(children, collapse = ""),
!!paste0("</", tag, ">")
))
}),
caller_env()
)
}
tag("b")
```
We need the weird `exprs(... = )` syntax to generate the empty `...` argument in the tag function. See Section \@ref(empty-symbol) for more details.
Now we can run our earlier example:
```{r}
p <- tag("p")
b <- tag("b")
i <- tag("i")
p("Some text. ", b(i("some bold italic text")), class = "mypara")
```
Before we generate functions for every possible HTML tag, we need to create a variant that handles void tags. `void_tag()` is quite similar to `tag()`, but it throws an error if there are any unnamed tags, and the tag itself looks a little different.
```{r}
void_tag <- function(tag) {
new_function(
exprs(... = ),
expr({
dots <- dots_partition(...)
if (length(dots$unnamed) > 0) {
abort(!!paste0("<", tag, "> must not have unnamed arguments"))
}
attribs <- html_attributes(dots$named)
html(paste0(!!paste0("<", tag), attribs, " />"))
}),
caller_env()
)
}
img <- void_tag("img")
img
img(src = "myimage.png", width = 100, height = 100)
```
### Processing all tags {#html-env}
Next we need to generate these functions for every tag. We'll start with a list of all HTML tags:
```{r}
tags <- c("a", "abbr", "address", "article", "aside", "audio",
"b","bdi", "bdo", "blockquote", "body", "button", "canvas",
"caption","cite", "code", "colgroup", "data", "datalist",
"dd", "del","details", "dfn", "div", "dl", "dt", "em",
"eventsource","fieldset", "figcaption", "figure", "footer",
"form", "h1", "h2", "h3", "h4", "h5", "h6", "head", "header",
"hgroup", "html", "i","iframe", "ins", "kbd", "label",
"legend", "li", "mark", "map","menu", "meter", "nav",
"noscript", "object", "ol", "optgroup", "option", "output",
"p", "pre", "progress", "q", "ruby", "rp","rt", "s", "samp",
"script", "section", "select", "small", "span", "strong",
"style", "sub", "summary", "sup", "table", "tbody", "td",
"textarea", "tfoot", "th", "thead", "time", "title", "tr",
"u", "ul", "var", "video"
)
void_tags <- c("area", "base", "br", "col", "command", "embed",
"hr", "img", "input", "keygen", "link", "meta", "param",
"source", "track", "wbr"
)
```
If you look at this list carefully, you'll see there are quite a few tags that have the same name as base R functions (`body`, `col`, `q`, `source`, `sub`, `summary`, `table`). This means we don't want to make all the functions available by default, either in the global environment or in a package. Instead, we'll put them in a list (like in Section \@ref(functional-factories)) and then provide a helper to make it easy to use them when desired. First, we make a named list containing all the tag functions:
```{r}
html_tags <- c(
tags %>% set_names() %>% map(tag),
void_tags %>% set_names() %>% map(void_tag)
)
```
This gives us an explicit (but verbose) way to create HTML:
```{r}
html_tags$p(
"Some text. ",
html_tags$b(html_tags$i("some bold italic text")),
class = "mypara"
)
```
We can then finish off our HTML DSL with a function that allows us to evaluate code in the context of that list. Here we slightly abuse the data mask, passing it a list of functions rather than a data frame. This is quick hack to mingle the execution environment of `code` with the functions in `html_tags`.
\indexc{eval\_tidy()}
```{r}
with_html <- function(code) {
code <- enquo(code)
eval_tidy(code, html_tags)
}
```
This gives us a succinct API which allows us to write HTML when we need it but doesn't clutter up the namespace when we don't.
```{r}
with_html(
body(
h1("A heading", id = "first"),
p("Some text &", b("some bold text.")),
img(src = "myimg.png", width = 100, height = 100)
)
)
```
If you want to access the R function overridden by an HTML tag with the same name inside `with_html()`, you can use the full `package::function` specification.
### Exercises
1. The escaping rules for `<script>` tags are different because they contain
JavaScript, not HTML. Instead of escaping angle brackets or ampersands,
you need to escape `</script>` so that the tag isn't closed too early.
For example, `script("'</script>'")`, shouldn't generate this:
```html
<script>'</script>'</script>
```
But
```html
<script>'<\/script>'</script>
```
Adapt the `escape()` to follow these rules when a new argument `script`
is set to `TRUE`.
1. The use of `...` for all functions has some big downsides. There's no
input validation and there will be little information in the
documentation or autocomplete about how they are used in the function.
Create a new function that, when given a named list of tags and their
attribute names (like below), creates tag functions with named arguments.
```{r, eval = FALSE}
list(
a = c("href"),
img = c("src", "width", "height")
)
```
All tags should get `class` and `id` attributes.
1. Reason about the following code that calls `with_html()` referencing objects
from the environment. Will it work or fail? Why? Run the code to
verify your predictions.
```{r, eval = FALSE}
greeting <- "Hello!"
with_html(p(greeting))
p <- function() "p"
address <- "123 anywhere street"
with_html(p(address))
```
1. Currently the HTML doesn't look terribly pretty, and it's hard to see the
structure. How could you adapt `tag()` to do indenting and formatting?
(You may need to do some research into block and inline tags.)
## LaTeX {#latex}
\index{LaTeX}
The next DSL will convert R expressions into their LaTeX math equivalents. (This is a bit like `?plotmath`, but for text instead of plots.) LaTeX is the lingua franca of mathematicians and statisticians: it's common to use LaTeX notation whenever you want to express an equation in text, like in email. Since many reports are produced using both R and LaTeX, it might be useful to be able to automatically convert mathematical expressions from one language to the other.
Because we need to convert both functions and names, this mathematical DSL will be more complicated than the HTML DSL. We'll also need to create a default conversion, so that symbols that we don't know about get a standard conversion. This means that we can no longer use just evaluation: we also need to walk the abstract syntax tree (AST).
### LaTeX mathematics
Before we begin, let's quickly cover how formulas are expressed in LaTeX. The full standard is very complex, but fortunately is [well documented](http://en.wikibooks.org/wiki/LaTeX/Mathematics), and the most common commands have a fairly simple structure:
* Most simple mathematical equations are written in the same way you'd type
them in R: `x * y`, `z ^ 5`. Subscripts are written using `_` (e.g., `x_1`).
* Special characters start with a `\`: `\pi` = $\pi$, `\pm` = $\pm$, and so on.
There are a huge number of symbols available in LaTeX: searching online for
`latex math symbols` returns many
[lists](http://www.sunilpatel.co.uk/latex-type/latex-math-symbols/).
There's even [a service](http://detexify.kirelabs.org/classify.html) that
will look up the symbol you sketch in the browser.
* More complicated functions look like `\name{arg1}{arg2}`. For example, to
write a fraction you'd use `\frac{a}{b}`. To write a square root, you'd use
`\sqrt{a}`.
* To group elements together use `{}`: i.e., `x ^ a + b` versus `x ^ {a + b}`.
* In good math typesetting, a distinction is made between variables and
functions. But without extra information, LaTeX doesn't know whether
`f(a * b)` represents calling the function `f` with input `a * b`,
or is shorthand for `f * (a * b)`. If `f` is a function, you can tell
LaTeX to typeset it using an upright font with `\textrm{f}(a * b)`.
(The `rm` stands for "Roman", the opposite of italics.)
### Goal
Our goal is to use these rules to automatically convert an R expression to its appropriate LaTeX representation. We'll tackle this in four stages:
* Convert known symbols: `pi` → `\pi`
* Leave other symbols unchanged: `x` → `x`, `y` → `y`
* Convert known functions to their special forms: `sqrt(frac(a, b))` →
`\sqrt{\frac{a, b}}`
* Wrap unknown functions with `\textrm`: `f(a)` → `\textrm{f}(a)`
We'll code this translation in the opposite direction of what we did with the HTML DSL. We'll start with infrastructure, because that makes it easy to experiment with our DSL, and then work our way back down to generate the desired output.
### `to_math`()
To begin, we need a wrapper function that will convert R expressions into LaTeX math expressions. This will work like `to_html()` by capturing the unevaluated expression and evaluating it in a special environment. There are two main differences:
* The evaluation environment is no longer constant, as it has to vary depending on
the input. This is necessary to handle unknown symbols and functions.
* We never evaluate in the argument environment because we're translating every
function to a LaTeX expression. The user will need to use explicitly `!!` in
order to evaluate normally.
This gives us:
```{r}
to_math <- function(x) {
expr <- enexpr(x)
out <- eval_bare(expr, latex_env(expr))
latex(out)
}
latex <- function(x) structure(x, class = "advr_latex")
print.advr_latex <- function(x) {
cat("<LATEX> ", x, "\n", sep = "")
}
```
Next we'll build up `latex_env()`, starting simply and getting progressively more complex.
### Known symbols
Our first step is to create an environment that will convert the special LaTeX symbols used for Greek characters, e.g., `pi` to `\pi`. We'll use the trick from Section \@ref(subset) to bind the symbol `pi` to the value `"\pi"`.
```{r}
greek <- c(
"alpha", "theta", "tau", "beta", "vartheta", "pi", "upsilon",
"gamma", "varpi", "phi", "delta", "kappa", "rho",
"varphi", "epsilon", "lambda", "varrho", "chi", "varepsilon",
"mu", "sigma", "psi", "zeta", "nu", "varsigma", "omega", "eta",
"xi", "Gamma", "Lambda", "Sigma", "Psi", "Delta", "Xi",
"Upsilon", "Omega", "Theta", "Pi", "Phi"
)
greek_list <- set_names(paste0("\\", greek), greek)
greek_env <- as_environment(greek_list)
```
We can then check it:
```{r}
latex_env <- function(expr) {
greek_env
}
to_math(pi)
to_math(beta)
```
Looks good so far!
### Unknown symbols
If a symbol isn't Greek, we want to leave it as is. This is tricky because we don't know in advance what symbols will be used, and we can't possibly generate them all. Instead, we'll use the approach described in Section \@ref(ast-funs): walking the AST to find all symbols. This gives us `all_names_rec()` and helper `all_names()`:
```{r, include = FALSE}
expr_type <- function(x) {
if (rlang::is_syntactic_literal(x)) {
"constant"
} else if (is.symbol(x)) {
"symbol"
} else if (is.call(x)) {
"call"
} else if (is.pairlist(x)) {
"pairlist"
} else {
typeof(x)
}
}
switch_expr <- function(x, ...) {
switch(expr_type(x),
...,
stop("Don't know how to handle type ", typeof(x), call. = FALSE)
)
}
flat_map_chr <- function(.x, .f, ...) {
purrr::flatten_chr(purrr::map(.x, .f, ...))
}
```
<!-- GVW: on first reading I wondered why you bothered to define `switch_expr`, since it only appears to be used once. Then I saw that there's a second call much further down. Highlight this somehow? -->
```{r}
all_names_rec <- function(x) {
switch_expr(x,
constant = character(),
symbol = as.character(x),
call = flat_map_chr(as.list(x[-1]), all_names)
)
}
all_names <- function(x) {
unique(all_names_rec(x))
}
all_names(expr(x + y + f(a, b, c, 10)))
```
We now want to take that list of symbols and convert it to an environment so that each symbol is mapped to its corresponding string representation (e.g., so `eval(quote(x), env)` yields `"x"`). We again use the pattern of converting a named character vector to a list, then converting the list to an environment.
```{r}
latex_env <- function(expr) {
names <- all_names(expr)
symbol_env <- as_environment(set_names(names))
symbol_env
}
to_math(x)
to_math(longvariablename)
to_math(pi)
```
This works, but we need to combine it with the Greek symbols environment. Since we want to give preference to Greek over defaults (e.g., `to_math(pi)` should give `"\\pi"`, not `"pi"`), `symbol_env` needs to be the parent of `greek_env`. To do that, we need to make a copy of `greek_env` with a new parent. This gives us a function that can convert both known (Greek) and unknown symbols.
```{r}
latex_env <- function(expr) {
# Unknown symbols
names <- all_names(expr)
symbol_env <- as_environment(set_names(names))
# Known symbols
env_clone(greek_env, parent = symbol_env)
}
to_math(x)
to_math(longvariablename)
to_math(pi)
```
### Known functions
Next we'll add functions to our DSL. We'll start with a couple of helpers that make it easy to add new unary and binary operators. These functions are very simple: they only assemble strings.
```{r}
unary_op <- function(left, right) {
new_function(
exprs(e1 = ),
expr(
paste0(!!left, e1, !!right)
),
caller_env()
)
}
binary_op <- function(sep) {
new_function(
exprs(e1 = , e2 = ),
expr(
paste0(e1, !!sep, e2)
),
caller_env()
)
}
unary_op("\\sqrt{", "}")
binary_op("+")
```
Using these helpers, we can map a few illustrative examples of converting R to LaTeX. Note that with R's lexical scoping rules helping us, we can easily provide new meanings for standard functions like `+`, `-`, and `*`, and even `(` and `{`.
```{r}
# Binary operators
f_env <- child_env(
.parent = empty_env(),
`+` = binary_op(" + "),
`-` = binary_op(" - "),
`*` = binary_op(" * "),
`/` = binary_op(" / "),
`^` = binary_op("^"),
`[` = binary_op("_"),
# Grouping
`{` = unary_op("\\left{ ", " \\right}"),
`(` = unary_op("\\left( ", " \\right)"),
paste = paste,
# Other math functions
sqrt = unary_op("\\sqrt{", "}"),
sin = unary_op("\\sin(", ")"),
log = unary_op("\\log(", ")"),
abs = unary_op("\\left| ", "\\right| "),
frac = function(a, b) {
paste0("\\frac{", a, "}{", b, "}")
},
# Labelling
hat = unary_op("\\hat{", "}"),
tilde = unary_op("\\tilde{", "}")
)
```
We again modify `latex_env()` to include this environment. It should be the last environment R looks for names in so that expressions like `sin(sin)` will work.
```{r}
latex_env <- function(expr) {
# Known functions
f_env
# Default symbols
names <- all_names(expr)
symbol_env <- as_environment(set_names(names), parent = f_env)
# Known symbols
greek_env <- env_clone(greek_env, parent = symbol_env)
greek_env
}
to_math(sin(x + pi))
to_math(log(x[i]^2))
to_math(sin(sin))
```
### Unknown functions
Finally, we'll add a default for functions that we don't yet know about. We can't know in advance what the unknown funtions will be so we again walk the AST to find them:
```{r}
all_calls_rec <- function(x) {
switch_expr(x,
constant = ,
symbol = character(),
call = {
fname <- as.character(x[[1]])
children <- flat_map_chr(as.list(x[-1]), all_calls)
c(fname, children)
}
)
}
all_calls <- function(x) {
unique(all_calls_rec(x))
}
all_calls(expr(f(g + b, c, d(a))))
```
We need a closure that will generate the functions for each unknown call:
```{r}
unknown_op <- function(op) {
new_function(
exprs(... = ),
expr({
contents <- paste(..., collapse = ", ")
paste0(!!paste0("\\mathrm{", op, "}("), contents, ")")
})
)
}
unknown_op("foo")
```
And again we update `latex_env()`:
```{r}
latex_env <- function(expr) {
calls <- all_calls(expr)
call_list <- map(set_names(calls), unknown_op)
call_env <- as_environment(call_list)
# Known functions
f_env <- env_clone(f_env, call_env)
# Default symbols
names <- all_names(expr)
symbol_env <- as_environment(set_names(names), parent = f_env)
# Known symbols
greek_env <- env_clone(greek_env, parent = symbol_env)
greek_env
}
```
This completes our original requirements:
```{r}
to_math(sin(pi) + f(a))
```
You could certainly take this idea further and translate types of mathematical expression, but you should not need any additional metaprogramming tools.
### Exercises
1. Add escaping. The special symbols that should be escaped by adding a backslash
in front of them are `\`, `$`, and `%`. Just as with HTML, you'll need to
make sure you don't end up double-escaping. So you'll need to create a small
S3 class and then use that in function operators. That will also allow you
to embed arbitrary LaTeX if needed.
1. Complete the DSL to support all the functions that `plotmath` supports.