One function to run them all… Or just eval
来源:互联网 发布:c语言的广泛应用 编辑:程序博客网 时间:2024/05/16 07:14
When I was writing a sysBio package, I needed a function that can be customized based on the user specified model. Such function would include user specified variables (parameters, rates, etc.) with assigned values, functions/expression that use those variables, and the ordinary differential equations that describe the model. After doing some reading, the eval function seemed like a logical choice. It was pretty easy to figure out how to use it and to implement the custom function I needed (if interested, you can see it in makeODEs.R and simulateModel.R).
This made me think about how easy it would be to use eval to create a general purpose function. Before I go any further, let me first say that I am aware of issues of using eval (security, harder readability and debugging, extra complexity, etc). If the hypothetical situation described below is a real-life problem I am working on, I would weight pros and cons before calling the presented eval approach the chosen approach. But this is just me being curious and we can pretend it is ok.
So, let’s say that we are working with people (users) whose goals are to create reports from different real-life data. Let’s also assume that these data are represented in tabular format and that reports represent summaries based on mathematical functions applied on the data presented in the tables. Additionally, let’s assume that our users have phobia from writing or learning any type of code (I have to add this, because my significant other had a very strong stand against eval and advocated for teaching the users to learn or even copy/paste the code in instead).
Clearly, using R, we could address users’ needs pretty easy – we could load and store users’ data in data frames, (check the data for consistency), use some type of ply function to summarize the data, and print the reports in the output files. Ignoring the read/check/print parts, we could create a summary in a single line command.
Now, let’s assume that users often want to change their summaries, for example every few months they remove one of the old summary functions and add one new . In order to keep users’ demands current, we would have to go and change our summary commands quite often.
This is where eval approach comes to play. Can we create a single, general function that will be able to handle different, specific users’ data summary requests?
The answer is yes.
However, note that the presented approach does not do any data evaluation and checking, nor it performs security check on user passed strings (which, in real-life, would be pretty big security concern).
We will assume that user’s data is already in the form of data frame (with stringsAsFactors set to FALSE). Furthermore, we will assume that users provide a list of variables (that correspond to table column names) over which they want to split and summarize the data (character vector), as well as a two-column table (data frame) which will contain the user specified name of the function used for summary (first column) and the function used for summary (second column).
As data example, we can use mtcars data set from the datasetslibrary. We will assume that the user is interested in information about gears and for each gear wants to calculate some custom value (defined as mpg+2*carb, total hp (defined as sum(hp)), and average mpg (defined as mean(mpg)):
ex1_df <- datasets::mtcars
ex1_vars <- "gear"
ex1_formulas <- data.frame(labs=c("cust_val", "total_hp", "avg_mpg"), ress=c("mpg+2*carb", "sum(hp)", "mean(mpg)"), stringsAsFactors = FALSE)
If we were coding the user's request, we could use the ddply function from the plyr package as follow:
ex1_trueResult <- plyr::ddply(ex1_df, .variable=c("gear"), function(x) data.frame(cust_val = x$mpg+2*x$carb, total_hp = sum(x$hp), avg_mpg = mean(x$mpg)))
The first thing I tried to do was to create a function that will recreate the ddply call:
Running this function will create a summary, but only for avg_mpg:
ex1_res01 <- oneForMany_first(ex1_df, ex1_vars, ex1_formulas)
This is because when an expression vector is passed to eval, the result of evaluating is the result of evaluating the last element. To get around this issue, we can use a sapply command:
This time, all expressions have been evaluated and all calculations have been performed. However, due to the column names mismatches (the function did not assigned the proper names to the columns), the data is not in desired format and one would need to parse it in order to use it (which can be done, but it is not what we want).
ex1_res02 <- oneForMany_second(ex1_df, ex1_vars, ex1_formulas)
Similarly, if we use a list instead of a vector, all expressions are evaluated and all values are be calculated.
But the resulting data is also not in desired format and should be parsed.
ex1_res03 <- oneForMany_third(ex1_df, ex1_vars, ex1_formulas)
Now, let's try something slightly different. Let's use eval to create a function that returns a data frame with desired columns (user specified functions):
ex1_res04 <- oneForMany_fourth(ex1_df, ex1_vars, ex1_formulas)
This seems to work. We can test to confirm that the obtain results are the same as the hard coded ones:testthat::expect_that(ex1_trueResult, testthat::equals(ex1_res04))
Alternatively, instead of trying to replicate the form of the hard-coded function, we could use the do.call function. For do.call, we need to specify the function we use and provide a list of arguments.
Note that we used the mutate instead of summarise function even thought we want data summary and this choice required additional filtering step. While summarise function perform group-wise summaries and creates a new data frame based on specified variables and functions, it fails when functions return data of different length (in our case, sum(hp) and mean(mpg) return single values, while mpg+2*carb result in multiple values).The mutatefunction adds columns to the existing data frame, so unequal lengths are not the issue.Another important thing to mention is that the arguments list has to be a named list.
We can run it and test the resultsex1_res05 <- oneForMany_fifth(ex1_df, ex1_vars, ex1_formulas) testthat::expect_that(ex1_trueResult, testthat::equals(ex1_res05))
This and another example are also available at:
- One function to run them all… Or just eval
- Google advances AI with ‘one model to learn them all
- 【深度学习】One Model to Learn Them All详解
- One Model To Learn Them All原文谷歌翻译版本
- 【转】Programmers Need To Learn Statistics Or I Will Kill Them All
- 【转】Programmers Need To Learn Statistics Or I Will Kill Them All
- Just Run
- is not found. Have you run APT to generate them?
- Sparse Dictionaries (How to Make or Find them)
- How to test only one benchmark function?
- Does One-Against-All or One-Against-One Improve the Performance of Multiclass Classifications?
- change file to unix or to dos /just dos2unix unix2dos
- Finding all paths climbing stairs in one or two steps.
- Failed to instantiate one or more classes
- one or more listeners failed to start
- How to Run Multiple Tomcat Instances on One Server?
- 学阿甘 just run
- Function One
- 块状链表
- (礼拜二log)java 构建层级关系的递归算法
- AEAI WM V1.5.0 升级说明,开源工作管理系统
- arcgis组件式开发中,从9.3升级为10.1系统不能调试的说明
- 架构的本质
- One function to run them all… Or just eval
- SCU OJ_4444_ Travel_最短路(大数据)
- Shell 实现多任务并发
- Spring容器(动态代理版)
- 正则表达式
- WLCardViewLayout: Card and swipe out layout of CollectView.(可以滑动删除的卡片视图布局)
- CentOS7 设置网络自动启动
- 从通知里启动应用的activity,返回时返回应用的主界面
- sourceTree安装与使用