R Data Frame Data Type


R data.frame is a powerful data type, especially when processing table (.csv). It can store the data as row and columns according to the table. The difference between data frame and matrix is that the column data of matrix are the same, while the column data of data frame may be of different modes and attributes.


Let's use the R Data Sets BOD (Biochemical Oxygen Demand), which is a data frame:

>x <- BOD
>is.matrix(x)

[1] FALSE

>is.data.frame(x)

[1] TURE

>class(x)

[1] "data.frame"

>x

Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8


as.data.frame() can coerce a list into a data frame, providing that the components of the list conforms to the restrictions of a data frame.


Each row of the data frame is a list or a data frame with one row:

>y <- x[2,]
>is.list(y)

[1] TRUE

>is.data.frame(y)

[1] TRUE

row.names() or rownames() returns all row names as a vector. colnames() or names() returns all column names (variables) as a vector.

> row.names(BOD)
[1] "1" "2" "3" "4" "5" "6"
> col <- colnames(BOD)
> col
[1] "Time"   "demand"
> col[1]
[1] "Time"


Access the column (variable) of the data frame:

>x$Time

[1] 1 2 3 4 5 7

>x$demand

[1] 8.3 10.3 19.0 16.0 15.6 19.8


A convenient way to access the columns of a data frame is using attach(), detach() statement. e.g. after attach(x), the column x$demand can be accessed by simply typing demand.

>attach(x)
>demand

[1] 8.3 10.3 19.0 16.0 15.6 19.8


In other words, attach() statement makes the components of the data frame visible. We can do some operations with the variable demand, and the components demand of the data frame will not be changed.

>demand <- demand + 10
>demand

[1] 18.3 20.3 29.0 26.0 25.6 29.8

>x$demand

[1] 8.3 10.3 19.0 16.0 15.6 19.8


Statement detach() is the reverse statement of attach().

>detach(x)
>demand

Error: object 'demand' not found

The row and column names can be changed.

> rownames(BOD)[2] <- "core"
> BOD
     Time demand
1       1    8.3
core    2   10.3
3       3   19.0
4       4   16.0
5       5   15.6
6       7   19.8
> colnames(BOD)[1] <- "days"
> BOD
     days demand
1       1    8.3
core    2   10.3
3       3   19.0
4       4   16.0
5       5   15.6
6       7   19.8


data.frame is the default data type when you read in a table. Following is a csv table file dataframe.csv, there are "Expression" value vs Subtype "A", "B" and "C" in column 1 and column 2:


Let's read in the data from the file:

>x <- read.csv("dataframe.csv",header=T,sep="\t")
>is.data.frame(x)

[1] TRUE