Coursera - R Programming - Week 1 - Data Types

Objects and Attributes

R has five basic or "atomic" classes of objects:

The most basic object is a vector.

A vector can only contain objects of the same class. The one exception to this rule is a list, which is represented as a vector but can contain objects of different classes.

vector() creates an empty vector

Two arguments: class of the object in the vector, length of the vector

Numbers

R objects can have attributes.

Attributes of an object can be accessed using the attributes() function.

Vectors and Lists

Creating Vectors

c() can be used to create vectors of objects

    > x <- c(0.5, 0.6) # numeric
    > x <- c(T, F) # logical
    > x <- 9:29 # integer
    > x <- c(1+0i, 2+4i) # complex
    
    > x <- vector("numeric", length = 10)
    > x
    [1] 0 0 0 0 0 0 0 0 0 0
    

Mixing Objects

    > y <- c(1.7, "a") # character
    > y <- c(TRUE, 2) # numeric
    > y <- c("a", TRUE) # character
    

coercion - every element in a mixed vector becomes of the same class

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

    > x <- 0:6
    > class(x)
    [1] "integer"
    > as.numeric(x)
    [1] 0 1 2 3 4 5 6
    > as.logical(x)
    [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE 
    > as.character(x)
    [1] "0" "1" "2" "3" "4" "5" "6"
    

Nonsensical coercion results in NAs.

    > x <- c("a", "b", "c")
    > as.numeric(x)
    [1] NA NA NA
    Warning message:
    As introduced by coercion
    > as.logical(x)
    [1] NA NA NA
    

Lists

    > x <- list(1, "a", TRUE, 1+4i)
    > x
    [[1]]
    [1] 1
    
    [[2]]
    [1] "a"
    
    [[3]]
    [1] TRUE
    
    [[4]]
    [1] 1+4i
    

Elements of a list have double-brackets around them. Elements of a vector have single-brackets around them.

Matrices

matrix - a vector with a dimension attribute

dimension - an attribute with an integer vector of length 2 (nrow, ncol)

    > m <- matrix(nrow = 2, ncol = 3)
    > dim(m)
    [1] 2 3
    > attributes(m)
    $dim
    [1] 2 3
    

Matrices are constructed column-wise, so entries start in the top-left of the first column and run down the columns.

    > m <- matrix(1:6, nrow = 2, ncol = 3)
    > m
    \t[,1]\t[,2]\t[,3]
    [1,]\t1\t3\t5
    [2,]\t2\t4\t6
    

Matrices can also be created directly from vectors by adding a dimension attribute.

    > m <- 1:10
    > m
    [1] 1 2 3 4 5 6 7 8 9 10
    > dim(m) <- c(2, 5)
    > m
    \t[,1]\t[,2]\t[,3]\t[,4]\t[,5]
    [1,]\t1\t3\t5\t7\t9
    [2,]\t2\t4\t6\t8\t10
    

cbind() - column-binding method of creating a matrix

rbind() - row-binding method of creating a matrix

    > x <- 1:3
    > y <- 10:12
    > cbind(x, y)
    \tx\ty
    [1,]\t1\t10
    [2,]\t2\t11
    [3,]\t3\t12
    > rbind(x, y)
    \t[,1]\t[,2]\t[,3]
    x\t1\t2\t3
    y\t10\t11\t12
    

Factors

factor - used to represent categorical data in an ordered or unordered fashion. an integer vector where each integer has a label. Input into the factor function is a character vector.

    > x <- factor(c("yes", "no", "yes", "yes", "no"))
    > x
    [1] yes no yes yes no
    Levels: no yes
    > table(x)
    x
    no\tyes
    2\t3
    > unclass(x)
    [1] 2 1 2 2 1
    attr(,"levels")
    [1] "no" "yes"
    

The order of the levels can be set using the levels argument to factor(). This can be important in linear modeling because the first level is used as the baseline level.

    > x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
    > x
    [1] yes yes no yes no
    Levels: yes no
    

The default ordering of levels is by alphabetical order.

Missing Values

Missing values are NA or NaN for undefined mathematical operations.

is.na() is used to test objects if they are NA

is.nan() is used to test for NaN

NA values have a class too, so there are integer NAs, character NAs, etc. While an NaN value is also NA, an NA value is not NaN.

Data Frames

Data Frames

    > x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
    > x
    \tfoo\tbar
    1\t1\tTRUE
    2\t2\tTRUE
    3\t3\tFALSE
    4\t4\tFALSE
    > nrow(x)
    [1] 4
    > ncol(x)
    [1] 2
    

Names Attribute

R objects can also have names, which is very useful for writing readable code and self-describing objects.

    > x <- 1:3
    > names(x)
    NULL
    > names(x) <- c("foo", "bar", "baz")
    > x
    foo\tbar\tbaz
    1\t2\t3
    > names(x)
    [1] "foo" "bar" "baz"
    

Lists can also have names.

    > x <- list(a = 1, b = 2, c = 3)
    > x
    $a
    [1] 1
    
    $b
    [1] 2
    
    $c
    [1] 3
    

Matrices can also have names.

    > m <- matrix(1:4, nrow = 2, ncol = 2)
    > dimnames(m) <- list(c("a", "b"), c("c", "d"))
    > m
    \tc\td
    a\t1\t3
    b\t2\t4
    

Summary of Data Types in R

Published January 18, 2015