Non-Standard Evaluation (NSE hereafter) occurs when R expressions are
captured and evaluated in a manner different than if they had been
executed without intervention. subset
is a canonical
example, which we use here with the built-in iris
data
set:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16 5.7 4.4 1.5 0.4 setosa
34 5.5 4.2 1.4 0.2 setosa
Sepal.Width
does not exist in the global environment,
yet this works because subset
captures the expression and
evaluates it within iris
.
A limitation of NSE is that it is difficult to use programmatically:
Error in subset.data.frame(iris, exp.a): 'subset' must be logical
oshka::expand
facilitates programmable NSE, as with this
simplified version of subset
:
subset2 <- function(x, subset) {
sub.exp <- expand(substitute(subset), x, parent.frame())
sub.val <- eval(sub.exp, x, parent.frame())
x[!is.na(sub.val) & sub.val, ]
}
subset2(iris, exp.a)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16 5.7 4.4 1.5 0.4 setosa
34 5.5 4.2 1.4 0.2 setosa
expand
is recursive:
exp.b <- quote(Species == 'virginica')
exp.c <- quote(Sepal.Width > 3.6)
exp.d <- quote(exp.b & exp.c)
subset2(iris, exp.d)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
118 7.7 3.8 6.7 2.2 virginica
132 7.9 3.8 6.4 2.0 virginica
We abide by R semantics so that programmable NSE functions are almost identical to normal NSE functions, with programmability as a bonus.
If you wish to write a function that uses a programmable NSE function
and forwards its NSE arguments to it, you must ensure the NSE
expressions are evaluated in the correct environment, typically the
parent.frame()
. This is no different than with normal NSE
functions. An example:
subset3 <- function(x, subset, select, drop=FALSE) {
frm <- parent.frame() # as per note in ?parent.frame, better to call here
sub.q <- expand(substitute(subset), x, frm)
sel.q <- expand(substitute(select), x, frm)
eval(bquote(base::subset(.(x), .(sub.q), .(sel.q), drop=.(drop))), frm)
}
We use bquote
to assemble our substituted call and
eval
to evaluate it in the correct frame. The parts of the
call that should evaluate in subset3
are escaped with
.()
. This requires some work from the programmer, but the
user reaps the benefits:
col <- quote(Sepal.Length)
sub <- quote(Species == 'setosa')
subset3(iris, sub & col > 5.5, col:Petal.Length)
Sepal.Length Sepal.Width Petal.Length
15 5.8 4.0 1.2
16 5.7 4.4 1.5
19 5.7 3.8 1.7
Notice that we used expand
with the base NSE function
subset
. Because expand
just generates language
objects, you can use it with any NSE function.
The forwarding is robust to unusual evaluation:
col.a <- quote(I_dont_exist)
col.b <- quote(Sepal.Length)
sub.a <- quote(stop("all hell broke loose"))
threshold <- 3.35
local({
col.a <- quote(Sepal.Width)
sub.a <- quote(Species == 'virginica')
subs <- list(sub.a, quote(Species == 'versicolor'))
lapply(
subs,
function(x) subset3(iris, x & col.a > threshold, col.b:Petal.Length)
)
})
[[1]]
Sepal.Length Sepal.Width Petal.Length
110 7.2 3.6 6.1
118 7.7 3.8 6.7
132 7.9 3.8 6.4
137 6.3 3.4 5.6
149 6.2 3.4 5.4
[[2]]
Sepal.Length Sepal.Width Petal.Length
86 6 3.4 4.5
One drawback of the
eval
/bquote
/.()
pattern is that
the actual objects inside .()
are placed on the call stack.
This is not an issue with symbols, but can be bothersome with data or
functions. For example, in:
my_fun_inner <- function(x) {
# ... bunch of code
stop("end")
}
my_fun_outer <- function(x) {
eval(bquote(.(my_fun)(.(x))), parent.frame())
}
my_fun_outer(mtcars)
traceback()
The entire deparsed function definition and data frame will be displayed in the traceback, which makes it difficult to see what is happening. A simple work-around is to use:
rlang
oshka
is simple in design and purpose. It exports a
single function that substitutes expressions into other expressions. It
hews closely to R semantics. rlang
is more ambitious and
more complex as a result. To use it you must learn new concepts and
semantics.
One manifestation of the additional complexity in rlang
is that you must unquote expressions to use them:
rlang.b <- quo(Species == 'virginica')
rlang.c <- quo(Sepal.Width > 3.6)
rlang.d <- quo(!!rlang.b & !!rlang.c)
dplyr::filter(iris, !!rlang.d)
As shown earlier, the expand
version is more
straightforward as it uses the standard quote
function and
does not require unquoting:
exp.b <- quote(Species == 'virginica')
exp.c <- quote(Sepal.Width > 3.6)
exp.d <- quote(exp.b & exp.c)
subset2(iris, exp.d)
On the other hand, forwarding of NSE arguments to NSE functions is
simpler in rlang
due to environment capture feature of
quosures:
rlang_virginica <- function(subset) {
subset <- enquo(subset)
dplyr::filter(iris, Species == 'virginica' & !!subset)
}
Because oshka
does not capture environments, we must
resort to the eval
/bquote
pattern:
oshka_virginica <- function(subset) {
subset <- bquote(Species == 'virginica' & .(substitute(subset)))
eval(bquote(.(subset2)(iris, .(subset))), parent.frame())
}
oshka
minimizes the complexity in what we see as the
most common use case, and sticks to R semantics for the more complicated
ones.
For additional discussion on rlang
see the following
presentations: