^

Lion in a what?

Digital Media, Organizational, and Personal Development

twitter linkedin github gtalk

Coursera - R Programming - Week 1 - Data Types

Objects and Attributes

R has five basic or “atomic” classes of objects:

  • character
  • numeric (real numbers)
  • integer
  • complex
  • logical (True/False)

The most basic object is a vector.

A vector can only contain objects of the same class. The one exception to this rule is a list, which is represented as a vector but can contain objects of different classes.

vector() creates an empty vector

Two arguments: class of the object in the vector, length of the vector

Numbers

  • generally treated as double precision real numbers
  • to explicitly define an integer, specify the “L” suffix
  • Inf represents infinity
  • NaN represents not a number

R objects can have attributes.

  • names, dimnames
  • dimensions (e.g., matrices, arrays)
  • class
  • length
  • other user-defined attributes/metadata

Attributes of an object can be accessed using the attributes() function.

Vectors and Lists

Creating Vectors

c() can be used to create vectors of objects

> x <- c(0.5, 0.6) # numeric
> x <- c(T, F) # logical
> x <- 9:29 # integer
> x <- c(1+0i, 2+4i) # complex
> x <- vector(“numeric”, length = 10)
> x
[1] 0 0 0 0 0 0 0 0 0 0

Mixing Objects

> y <- c(1.7, “a”) # character
> y <- c(TRUE, 2) # numeric
> y <- c(“a”, TRUE) # character

coercion - every element in a mixed vector becomes of the same class

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

> x <- 0:6
> class(x)
[1] “integer”
> as.numeric(x)
[1] 0 1 2 3 4 5 6
> as.logical(x)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE 
> as.character(x)
[1] “0” “1” “2” “3” “4” “5” “6”

Nonsensical coercion results in NAs.

> x <- c(“a”, “b”, “c”)
> as.numeric(x)
[1] NA NA NA
Warning message:
As introduced by coercion
> as.logical(x)
[1] NA NA NA

Lists

> x <- list(1, “a”, TRUE, 1+4i)
> x
[[1]]
[1] 1

[[2]]
[1] “a”

[[3]]
[1] TRUE

[[4]]
[1] 1+4i

Elements of a list have double-brackets around them. Elements of a vector have single-brackets around them.

Matrices

matrix - a vector with a dimension attribute

dimension - an attribute with an integer vector of length 2 (nrow, ncol)

> m <- matrix(nrow = 2, ncol = 3)
> dim(m)
[1] 2 3
> attributes(m)
$dim
[1] 2 3

Matrices are constructed column-wise, so entries start in the top-left of the first column and run down the columns.

> m <- matrix(1:6, nrow = 2, ncol = 3)
> m
    [,1]    [,2]    [,3]
[1,]    1   3   5
[2,]    2   4   6

Matrices can also be created directly from vectors by adding a dimension attribute.

> m <- 1:10
> m
[1] 1 2 3 4 5 6 7 8 9 10
> dim(m) <- c(2, 5)
> m
    [,1]    [,2]    [,3]    [,4]    [,5]
[1,]    1   3   5   7   9
[2,]    2   4   6   8   10

cbind() - column-binding method of creating a matrix

rbind() - row-binding method of creating a matrix

> x <- 1:3
> y <- 10:12
> cbind(x, y)
    x   y
[1,]    1   10
[2,]    2   11
[3,]    3   12
> rbind(x, y)
    [,1]    [,2]    [,3]
x   1   2   3
y   10  11  12

Factors

factor - used to represent categorical data in an ordered or unordered fashion. an integer vector where each integer has a label. Input into the factor function is a character vector.

> x <- factor(c(“yes”, “no”, “yes”, “yes”, “no”))
> x
[1] yes no yes yes no
Levels: no yes
> table(x)
x
no  yes
2   3
> unclass(x)
[1] 2 1 2 2 1
attr(,”levels”)
[1] “no” “yes”

The order of the levels can be set using the levels argument to factor(). This can be important in linear modeling because the first level is used as the baseline level.

> x <- factor(c(“yes”, “yes”, “no”, “yes”, “no”), levels = c(“yes”, “no”))
> x
[1] yes yes no yes no
Levels: yes no

The default ordering of levels is by alphabetical order.

Missing Values

Missing values are NA or NaN for undefined mathematical operations.

is.na() is used to test objects if they are NA

is.nan() is used to test for NaN

NA values have a class too, so there are integer NAs, character NAs, etc. While an NaN value is also NA, an NA value is not NaN.

Data Frames

Data Frames

  • used to store different classes of objects in tabular data
  • represented as a special type of list, where every element of the list has the same length
  • each element of the list can be thought of as a column, and the length of each element is the number of rows
  • have a special attribute called row.names
  • usually created by calling read.table() or read.csv()
  • can be converted to a matrix using data.matrix()
> x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
> x
    foo bar
1   1   TRUE
2   2   TRUE
3   3   FALSE
4   4   FALSE
> nrow(x)
[1] 4
> ncol(x)
[1] 2

Names Attribute

R objects can also have names, which is very useful for writing readable code and self-describing objects.

> x <- 1:3
> names(x)
NULL
> names(x) <- c(“foo”, “bar”, “baz”)
> x
foo bar baz
1   2   3
> names(x)
[1] “foo” “bar” “baz”

Lists can also have names.

> x <- list(a = 1, b = 2, c = 3)
> x
$a
[1] 1

$b
[1] 2

$c
[1] 3

Matrices can also have names.

> m <- matrix(1:4, nrow = 2, ncol = 2)
> dimnames(m) <- list(c(“a”, “b”), c(“c”, “d”))
> m
    c   d
a   1   3
b   2   4

Summary of Data Types in R

  • atomic classes: numeric, logical, character, integer, complex
  • vectors, lists
  • factors
  • missing values
  • data frames
  • names