R Programming Part 2 - Data Types: Objects and Attributes

Getting started with R


R Console : Input

Assignment operator :


> x <- 5
> x ## auto-printing
[1] 5

> print(x) ## explicit printing
[1] 5
[1] indicates that x is vector 5 is the first element of vector.

## hash is used for comments

> x <- 1:20 ## : operator is used to create integer sequences.
> x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x is vector of value 20. 


R Data Types: Objects and Attributes

Data Types:
  1. atomic classes : numeric, logical, character, integer, complex
  2. vectors,list
  3. factors
  4. missing values
  5. data frames
  6. names


Vectors:
The most basic object is a vector. A vector can only contains objects of the same class. Empty vectors can be created with vector() function.

Numbers:
Numbers in R a generally treated as numeric objects(double precision real numbers) If you explicitly want to an integer , you need to specify the L suffix.
e.g. 1L
Special number Inf which represents infinity. NaN not a number (represnets an undefined value)

Attributes :
  1. name,dimnames
  2. dimensions(e.g. matrics, arrays)
  3. class
  4. length
  5. other user defined attibutes.

Creating Vectors:
  • c() function can be used to create vectors of objects.
> x <- 5
> x ## auto-printing
[1] 5

> print(x) ## explicit printing
[1] 5
[1] indicates that x is vector 5 is the first element of vector.

## hash is used for comments

> x <- 1:20 ## : operator is used to create integer sequences.
> x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x is vector of value 20. 

Mixing Objects:

> y <- c(1.7,'a') ##Character
> y
[1] "1.7" "a"

> y <- c(TRUE,'a') ##Character
> y
[1] "TRUE" "a"

> y <- c(TRUE,2) ##Numeric
> y
[1] 1 2

When different objects are mixed in vector, coercion occurs so that every element in the vector is of same class.

Explicit Coercion :
  • Objects can be explicitly coerced from one class to another using as. * functions, if available.
> x <- 0:6
> x
[1] 0 1 2 3 4 5 6

> class(x)
[1] "integer"

> as.numeric(x)
[1] 0 1 2 3 4 5 6

> as.logical(x)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE

> as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"

> x <- c("a","b","c")
> as.numeric(x)
[1] NA NA NA

Warning message:
NAs introduced by coercion

> as.logical(x)
[1] NA NA NA

> as.complex(x)
[1] NA NA NA

Warning message:
NAs introduced by coercion 


Lists:
  • Lists are a special type of vector that can contain elements of different classes.
> x <- list(1,"a",T,1+4i)
> x

[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

[[4]]
[1] 1+4i 

Matrices:
  • Matrices are vectors with a dimemnsion attribute. 
  • The dimension attribute is itself an integer vector of length 2 (nrow,ncol)
> m <- matrix(nrow = 2, ncol = 3)
> m
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA

> dim(m)
[1] 2 3

> attributes(m)
$dim
[1] 2 3

> m <- matrix(1:6, nrow=2,ncol=3)
> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

> m <- 1:10
> m
[1] 1 2 3 4 5 6 7 8 9 10

> dim(m) <- c(2,5)
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]   1    3    5    7    9
[2,]   2    4    6    8    10

cbind-ing and rbind-ing

> x<- 1:3
> y <- 10:12
> cbind(x,y)
     x y
[1,] 1 10
[2,] 2 11
[3,] 3 12

> rbind(x,y)

 [,1] [,2] [,3]
x 1     2    3
y 10    11   12


Factors:
  • Factors are used to represent categorical data. 
  • Factors can be unordered or ordered. 
  • One can think of a factor as an integer vector where each integer has a label.
  • Factors are treated specially by modeling functions like lm() and glm()
  • Factors with lables is better than using integers because factors are self-describing;
  • e.g. “Male” and “Female” is better than a variable that has value 1 & 2
> x <- factor(c("yes","yes","no","yes","no"))
> x
[1] yes yes no yes no
Levels: no yes

> table(x)
x
no yes
2 3

> unclass(x)
[1] 2 2 1 2 1
attr(,"levels")
[1] "no" "yes"


Order of the levels can be set using the levels argument to factor().

> x <- factor(c("yes","yes","no","yes","no"),levels = c("yes","no"))
> x
[1] yes yes no yes no
Levels: yes no


Missing Values:
  • Missing values are denoted by NA or NaN for undefined mathematical operations.
  • is.na() is used to test objects if they are NA
  • is.nan() is used to test for NaN
  • NA values have a class also, so there are integer NA, character NA, etc.
  • A NaN value is also NA but the converse is not true.
> x <- c(1,2,NA,10,3)
> is.na(x)
[1] FALSE FALSE TRUE FALSE FALSE

> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE

> x <- c(1,2,NaN,NA,4)
> is.na(x)
[1] FALSE FALSE TRUE TRUE FALSE

> is.nan(x)
[1] FALSE FALSE TRUE FALSE FALSE 

Data Frames:
  • Data frames are used to store tabular data.
  • They are represented as a special type of list where every element of the list has to have the same length.
  • Every elements of the list can be thought of as a column and length of each element of the list is the number of rows.
  • Unlike matrices, data frames can store different classes of objects in each column (just like lists);
  • Matrices must have every elemets be the same class.
  • Data frames also have a special attributes called row.names
  • Data frames are usually created by calling read.table() or read.csv()
  • Data converted to matrix by calling data.matrix()
> x <-data.frame(foo = 1:4, bar = c(T,T,F,F))
> x
foo bar
1 1 TRUE
2 2 TRUE
3 3 FALSE
4 4 FALSE

> nrow(x)
[1] 4

> ncol(x)
[1] 2 


Names:
  • R objects can also have names, which is very useful for writing readable code and self-describing objects.
> x <- 1:3
> names(x)
NULL

> names(x) <- c("foo","bar","nrof")
> x
foo bar nrof
1   2   3

> names(x)
[1] "foo" "bar" "nrof"

List can also have names.
> x <- list(a=1,b=2,c=3)
> x
$a
[1] 1

$b
[1] 2

$c
[1] 3

matrices

> m <- matrix(1:4,nrow=2,ncol=2)
> dimnames(m) <-list(c("a","b"),c("c","d"))
> m
  c d
a 1 3
b 2 4  





  • Reference :"Programming for Data Science" by Roger D. Peng-R
  • 0 comments:

    Post a Comment