swirl 10: lapply and sapply

来源:互联网 发布:c语言中取反 编辑:程序博客网 时间:2024/06/05 06:18

These powerful functions, along with their close relatives (vapply() and tapply(), among others) offer a concise and convenient means of implementing theSplit-Apply-Combine strategyfor data analysis.

Throughout this lesson, we'll use the Flags dataset from the UCI Machine Learning Repository. This dataset contains details of various nations and their flags. More information may be found here: http://archive.ics.uci.edu/ml/datasets/Flags

viewinfo() To open a more complete description of the dataset in a separate text file, type viewinfo()

lapply() The lapply() function takes a list as input, applies a function to each element of the list, then returns a list of the same length as the original one.

Type cls_list <- lapply(flags, class) to apply the class() function to each column of the flags dataset and store the result in a variable called cls_list. Note that you just supply the name of the function you want to apply (i.e. class), without the usual parentheses after it.

The 'l' in 'lapply' stands for 'list'. Type class(cls_list) to confirm that lapply() returned a list.

as.character() In this case, since every element of the list returned by lapply() is a character vector of length one (i.e. "integer" and "vector"), cls_list can be simplified to a character vector. To do this manually, type as.character(cls_list).

sapply() allows you to automate this process by calling lapply() behind the scenes, but then attempting to simplify (hence the 's' in 'sapply') the result for you. In general, if the result is a list where every element is of length one, then sapply() returns a vector. If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix. If sapply() can't figure things out, then it just returns a list, no different from what lapply() would give you.

unique() When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the 'unique' elements.

function(elem) elem[2] lapply(unique_vals, function(elem) elem[2]) will return a list containing the second item from each element of the unique_vals list. Note that our function takes one argument, elem, which is just a 'dummy variable' that takes on the value of each element of unique_vals, in turn.

anonymous functions The only difference between previous examples and this one is that we are defining and using our own function right in the call to lapply(). Our function has no name and disappears as soon as lapply() is done using it. So-called 'anonymous functions' can be very useful when one of R's built-in functions isn't an option.

0 0