--- title: "NSE Functions with `oshka`" author: "Brodie Gaslam" output: rmarkdown::html_vignette: toc: true css: styles.css vignette: > %\VignetteIndexEntry{NSE Functions with oshka} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r global_options, echo=FALSE} knitr::opts_chunk$set(error=TRUE, comment=NA) library(oshka) knitr::read_chunk('../tests/helper/ersatz.R') ``` ## Overview We will implement simplified versions of `dplyr` and `data.table` to illustrate how to write programmable NSE functions with `oshka`. The implementations are intentionally limited in functionality, robustness, and speed for the sake of simplicity. ## An Ersatz `dplyr` ### Interface The interface is as follows: ```{r, eval=FALSE} group_r <- function(x, ...) {...} # similar to dplyr::group_by filter_r <- function(x, subset) {...} # similar to dplyr::filter summarize_r <- function(x, ...) {...} # similar to dplyr::summarise `%$%` <- function(x, y) {...} # similar to the magrittr pipe ``` ```{r, echo=FALSE} <> <> <> ``` Our functions mimic the corresponding `dplyr` ones: ```{r} CO2 %$% # built-in dataset filter_r(grepl("[12]", Plant)) %$% group_r(Type, Treatment) %$% summarize_r(mean(conc), mean(uptake)) ``` ### Implementation Most of the implementation is not directly related to `oshka` NSE, but we will go over `summarize_r` to highlight how those parts integrate with the rest. `summarize_r` is just a forwarding function: ```{r eval=FALSE} <> ``` We use the `eval`/`bquote` pattern to [forward `NSE` arguments][1]. We retrieve `summarize_r_l` from the current function frame with `.()`, because there is no guarantee we would find it on the search path starting from the parent frame. In this case it happens to be available, but it would not be if these functions were in a package. We present `summarize_r_l` in full for reference, but feel free to skip as we highlight the interesting bits next: ```{r eval=FALSE} <> ``` The only `oshka` specific line is the second one: ```{r eval=FALSE} exps.sub <- expand(substitute(els), x, frm) ``` `els` is the language captured and forwarded by `summarize_r`. We run `expand` on that language with our data `x` as the environment and the parent frame as the enclosure. We then compute the groups: ```{r eval=FALSE} grps <- make_grps(x) # see appendix splits <- lapply(grps, eval, x, frm) ``` `make_grps` extracts the grouping expressions generating by `group_r`. These have already been substituted so we evaluate each one with `x` as the environment and the parent frame as the enclosure. We use this to split our data into groups: ```{r eval=FALSE} dat.split <- split(x, splits, drop=TRUE) ``` Finally we can evaluate our `expand`ed expressions within each of the groups: ```{r eval=FALSE} # aggregate res.list <- lapply( dot_list(exps.sub), # see appendix function(exp) lapply(dat.split, eval, expr=exp, enclos=frm) ) list_to_df(res.list, grp.split, splits) # see appendix ``` `dot.list` turns `exps.sub` into a list of expressions. Each expression is then evaluated with each data chunk as the environment and the parent frame as the enclosure. Finally `list_to_df` turns our lists of vectors into a data frame. You can see the rest of the implementation in the [appendix](#appendix). ### Examples That single `expand` line enables a programmable NSE: ```{r} f.exp <- quote(grepl("[12]", Plant)) s.exp <- quote(mean(uptake)) CO2 %$% filter_r(f.exp & conc > 500) %$% group_r(Type, Treatment) %$% summarize_r(round(s.exp)) ``` Because `%$%` uses `expand` you can even do the following: ```{r} f.exp.b <- quote(filter_r(grepl("[12]", Plant) & conc > 500)) g.exp.b <- quote(group_r(Type, Treatment)) s.exp.b <- quote(summarize_r(mean(conc), mean(uptake))) exp <- quote(f.exp.b %$% g.exp.b %$% s.exp.b) CO2 %$% exp ``` ## An Ersatz `data.table` ### Implementation We wish to re-use our ersatz `dplyr` functions to create a `data.table`-like interface: ```{r} <> ``` Again, we use the `eval`/`bquote` pattern to forward the NSE arguments to our NSE functions `filter_r`, `group_r_l`, and `summarize_r_l`. The pattern is not trivial, but it only took six lines of code to transmogrify our faux-`dplyr` into a faux-`data.table`. ### Examples After we add the `super_df` class to our data we can start using it with `data.table` semantics, but with programmable NSE: ```{r} co2 <- as.super_df(CO2) co2[f.exp, s.exp, by=Type] exp.a <- quote(max(conc)) exp.b <- quote(min(conc)) co2[f.exp, list(exp.a, exp.b), by=list(Type, Treatment)][1:3,] exp.c <- quote(list(exp.a, exp.b)) exp.d <- quote(list(Type, Treatment)) co2[f.exp, exp.c, by=exp.d][1:3,] ``` Despite the forwarding layers the symbols resolve as expected in complex circumstances: ```{r} exps <- quote(list(stop("boo"), stop("ya"))) # don't use this g.exp <- quote(Whatever) # nor this local({ summarize_r_l <- function(x, y) stop("boom") # nor this max.upt <- quote(max(uptake)) # use this min.upt <- quote(min(uptake)) # and this exps <- list(max.upt, min.upt) g.exp <- quote(Treatment) # and this lapply(exps, function(y) co2[f.exp, y, by=g.exp]) }) ``` And we can even nest our `dplyr` and `data.table` for an unholy abomination: ```{r} exp <- quote(data.frame(upt=uptake) %$% summarize_r(new.upt=upt * 1.2)) local({ exps <- list(quote(sum(exp$new.upt)), quote(sum(uptake))) g.exp <- quote(Treatment) lapply(exps, function(y) co2[f.exp, y, by=g.exp]) }) ``` ## Appendix Ersatz `dplyr` implementation: ```{r eval=FALSE} ## - Summarize ----------------------------------------------------------------- <> <> <> ``` [1]: ./Introduction.html#forwarding-nse-arguments-to-nse-functions