R apply Functions


apply() function applies a function to margins of an array or matrix. Similar functions include lapply(), sapply(), mapply() and tapply(). These functions are more efficient than loops when handling data in batch.

apply(x,margin,func, ...)


• x: array
• margin: subscripts, for matrix, 1 for row, 2 for column
• func: the function
...

>BOD #R built-in dataset, Biochemical Oxygen Demand

Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8


Sum up for each row:

> apply(BOD,1,sum)

[1] 9.3 12.3 22.0 20.0 20.6 26.8


Sum up for each column:

> apply(BOD,2,sum)

Time demand
22 89


Multipy all values by 10:

> apply(BOD,1:2,function(x) 10 * x)

Time demand
[1,] 10 83
[2,] 20 103
[3,] 30 190
[4,] 40 160
[5,] 50 156
[6,] 70 198


Used for array, margin set to 1:

> x <- array(1:9)
> apply(x,1,function(x) x * 10)

[1] 10 20 30 40 50 60 70 80 90


Two dimension array, margin can be 1 or 2:

> x <- array(1:9,c(3,3))
> x

[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9


> apply(x,1,function(x) x * 10) #or apply(x,2,function(x) x * 10)

[1] 10 20 30 40 50 60 70 80 90


lapply() function can handle data frame with similar results, return is a list:

> lapply(BOD,sum)

$Time
[1] 22
$demand
[1] 89


> lapply(BOD,mean)

$Time
[1] 3.666667
$demand
[1] 14.83333


sapply() has similar function, it defines "simplify=TRUE" by default, thus return a vector:

> sapply(BOD,sum)

Time demand
22 89

> sapply(BOD,sum,simplify=FALSE)

$Time
[1] 22
$demand
[1] 89


mapply()is the multivariate version of sapply(), it applies a function to multiple list or vector arguments.
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)


> mapply(sum,BOD$Time,BOD$demand)
[1]  9.3 12.3 22.0 20.0 20.6 26.8
> mapply(sum,BOD$Time)
[1] 1 2 3 4 5 7
> mapply(sum,BOD$demand)
[1]  8.3 10.3 19.0 16.0 15.6 19.8
> mapply(sum, trees)
 Girth Height Volume
410.7 2356.0  935.3
> f <- function(x,y) (x * 12 + y) * 0.0254 #ft in to meter convert
> mapply(f, c(5,6,5),c(3,1,9))
[1] 1.6002 1.8542 1.7526


tapply() applies a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

This example uses the builtin dataset CO2, sum up the uptake grouped by different plants.

> tapply(CO2$uptake,CO2$Plant, sum)
  Qn1   Qn2   Qn3   Qc1   Qc3   Qc2   Mn3   Mn2   Mn1   Mc2   Mc3   Mc1
232.6 246.1 263.3 209.8 228.1 228.9 168.8 191.4 184.8  85.0 121.1 126.0

Following is the builtin dataset CO2: