Programming over R

来源：互联网发布：比比多味豆淘宝编辑：程序博客网时间：2024/06/05 04:52

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

138

ShareTweet

R is a very fluid language amenable to meta-programming, or alterations of the language itself. This has allowed the late user-driven introduction of a number of powerful features such as magrittr pipes, the foreach system, futures, data.table, and dplyr. Please read on for some small meta-programming effects we have been experimenting with.

NewImage

Meta-Programming

Meta-programming is a powerful tool that allows one to re-shape a programming language or write programs that automate parts of working with a programming language.

Meta-programming itself has the central contradiction that one hopes nobody else is doing meta-programming, but that they are instead dutifully writing referentially transparent code that is safe to perform transformations over, so that one can safely introduce their own clever meta-programming. For example: one would hate to lose the ability to use a powerful package such as future because we already “used up all the referential transparency” for some minor notational effect or convenience.

That being said, R is an open system and it is fun to play with the notation. I have been experimenting with different notations for programming over R for a while, and thought I would demonstrate a few of them here.

Let Blocks

We have been using let to code over non-standard evaluation (NSE) packages in R for a while now. This allows code such as the following:

library("dplyr")library("wrapr")d <- data.frame(x = c(1, NA))cname <- 'x'rname <- paste(cname, 'isNA', sep = '_')let(list(COL = cname, RES = rname),    d %>% mutate(RES = is.na(COL))) #    x x_isNA # 1  1  FALSE # 2 NA   TRUE

let is in fact quite handy notation that will work in a non-deprecated manner with both dplyr 0.5 and dplyr 0.6. It is how we are future-proofing our current dplyr workflows.

Unquoting

dplyr 0.6 is introducing a new execution system (alternately called rlang or tidyeval, see here) which uses a notation more like the following (but fewer parenthesis, and with the ability to control left-hand side of an in-argument assignment):

beval(d %>% mutate(x_isNA = is.na((!!cname))))

The inability to re-map the right-hand side of the apparent assignment is because the “(!! )” notation doesn’t successfully masquerade as a lexical token valid on the left-hand side of assignments or function argument bindings.

And there was an R language proposal for a notation like the following (but without the quotes, and with some care to keep it syntactically distinct from other uses of “@”):

ateval('d %>% mutate(@rname = is.na(@cname))')

beval and ateval are just curiosities implemented to try and get a taste of the new dplyr notation, and we don’t recommend using them in production — their ad-hoc demonstration implementations are just not powerful enough to supply a uniform interface. dplyritself seems to be replacing a lot of R‘s execution framework to achieve stronger effects.

Write Arrow

We are experimenting with “write arrow” (a deliberate homophone of “right arrow”). It allows the convenient storing of a pipe result into a variable chosen by name.

library("dplyr")library("replyr")'x' -> whereToStoreResult7 %>% sin %>% cos %->_% whereToStoreResultprint(x) ## [1] 0.7918362

Notice, the value “7” is stored in the variable “x” not in a variable named “whereToStoreResult”. “whereToStoreResult” was able to name where to store the value parametrically.

This allows code such as the following:

for(i in 1:3) {   i %->_% paste0('x',i)}

(Please run the above to see the automatic creation of variables named “x1”, “x2”, and “x3”, storing values 1,2, and 3 respectively.)

We know left to right assignment is heterodox; but the notation is very slick if you are consistent with it, and add in some formatting rules (such as insisting on a line break after each pipe stage).

Conclusion

One wants to use meta-programming with care. In addition to bringing in desired convenience it can have unexpected effects and interactions deeper in a language or when exposed to other meta-programming systems. This is one reason why a “seemingly harmless” proposal such as “user defined unary functions” or “at unquoting” takes so long to consider. This is also why new language features are best tried in small packages first (so users can easily chose to include them or not in their larger workflow) to drive public request for comments (RFC) processes or allow the ideas to evolve (and not be frozen at their first good idea, a great example of community accepted change being Haskel’s switch from request chaining IO to monadic IO; the first IO system “seemed inevitable” until it was completely replaced).

0 0