-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A grammar of graphics details? #7
Comments
This is a seriously cool idea. I'd lean towards having the zillion helper functions and the 'ggbot' chat lexer. Looking at https://github.com/systemincloud/rly I don't think I could immediately build that, so if you're more familiar with the idea then by all means please have a crack at it. I think it would be of great benefit! We could make a fairly simple approximation to this with a heap of |
The text expression is a command, in which the subject is implicit (the subject is "ggbot"), the verb is implicit ("make"), the object of the command is one of the entities listed under Arguments here, the attribute of the named entity to be modified is inferred from the attribute value ("blue" must be a colour, "2" must be an absolute size/thickness scalar, "2cm" is scale with units, "+25%" is a relative scalar, "-45deg" is an angular quantity etc). In some cases the attribute to be modified might need to be named explicitly. One command per string, but a vector of strings or multiple ellipsis string arguments could be passed to one call of ggbot(). Probably want to allow modification of multiple attributes per command string, but strictly only one object entity per command. Thus, "text blue" would make all text blue. "text blue 15" would make all text blue and size 15. But "text line blue" would be illegal because there are two object entities (targets): "text" and "line". (Aside: this restriction of one target entity per command is just to keep it simple to start with). Now, a minor complication is how the object entities should be specified. "text" or "line" are easy, and, as per the ggplot2 theme model, these apply to all text or to all lines. But what about, say, the x-axis text? Well, adding "x" or "y" (or "z"?) implies that modification of some axis attributes are being requested, in other words, that "axis" is implied. If modification of both axes is desired, then any or all of the following could be supported: "axis blue", "axes blue" "x y blue". Except we haven't specified which aspect of the axis or axes we want to modify, so we need a qualifier: for axes, the valid ones are "title", "text", "ticks" and "line" (and pluralised fires of those etc). What about suppressing elements? I think the solution is recognise some special-case attributes, such as "invisible", or gerund forms such as "disappear", "begone" or just "no" or "none" or "zap" or "ditch" etc. OK, is this language model adequate? The way to find out is to build a table of all the entity target types which theme() supports, and a separate table for each entity type, enumerating all the attributes that entity type can have set, and specify an example command string and check that it can be unambiguously parsed. Then check that there is no overlap between any of the words used as specifications in both tables. If there is any overlap (i.e. the sets of words are not disjoint) then there will be ambiguity which can only be resolved by word order, which means using a more complex language model. Creating the tables is a slightly tedious task, but if split up shouldn't take too long. Once the adequacy of the language model is confirmed, or it is tweaked until adequate, then coding should commence. Such a table can also provide the basis for the ggbot() tests, of course. Obviously, lots of synonyms can be included in the language model. The question is that whether an unambiguous model can be constructed with just single-word tokens, or not? On a quick scan, I think it can, but that needs to be thoroughly checked. If not, then a smarter tokeniser may be needed. A lemmatiser could also handle synonyms and alternative spellings etc. But I agree, the aim should be to keep it as lightweight as possible. The aim of designing the language model first, before coding it up, is to check whether a bunch of if/else statements is enough, or whether a proper lever and lemmatiser is needed or worthwhile. Or whether is is better to build a formal domain-specific language, in which case yacc (via rly) needs to be used to build a parse tree. However, I don't think we want a formal DSL. |
...and the rest of these, and then need to consider the settable attributes for element type, in a separate table. The main thing is to ensure that there is no overlap (i.e. ambiguity) between the way entities are specified and the way attributes and quantities/values are specified.
|
I was just wondering if it is worth designing and defining an formal or semi-formal grammar for these function names, so that they are readily guessable without having to look them up? In other words, a naming convention, such as, based on what has been done so far, [prefix][axis-qualifier][attribute|verb].
Or, rather than having to define a zillion easy_ functions, what about one function that uses a little DSL to set the theme attributes? No need to mess with lex and yacc (which are available in R via the rly package btw), it might be enough just to pass commands and scalars as ellipsis arguments eg
ggdetails("x", "axis", "blue")
or, equivalently,
ggdetails("blue", "x", "axis")
or, using a lexer (eg rly) to tokenise a single string argument:
ggdetails("blue x axis")
or
ggdetails("x axis blue")
This also makes it easier for users who are non-native English speakers, whose natural word-ordering assumptions may be different - order would not matter.
The order of the ellipsis arguments or tokens wouldn't matter because the class of the argument or token can be inferred from its value in the very constrained context of the ggdetails() function. "axis", "legend", "text" can only refer to plot elements, "blue", "orange" or a hex RGB value can only refer to colours, and "2" or "5" are scalars (for font size or rotation), and "+25%" or "-33%" means increase or decrease current size (or whatever is specified by 25% or 33% respectively. That way argument order doesn't need to be remembered.
Actually, using yacc and lex via rly to build a simple DSL might be the best option, but the utility of the concept could be tested using individual ellipsis arguments to start with.
The text was updated successfully, but these errors were encountered: