This chapter delves deeper into the concepts of arrays and vectors that were introduced in the ‘Basics’ chapter. In R, everything is included in the base package. The main advantage of using vectors is that it allows for vectorized operations, rather than using loops to perform operations on each element of an object. Vectorized operations allow for more efficient processing of blocks of data.
Let’s first take a look at an example to understand the benefits of using vectorized operations:
# R
matrix(rep(0,2*3), 2, 3)
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
matrix(rep(1,2*3), 2, 3)
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
seq(1,6)
## [1] 1 2 3 4 5 6
matrix(seq(1,6),2,3)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
seq(1, 4, length.out = 10)
## [1] 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000 3.333333
## [9] 3.666667 4.000000# R
#generate a matrix
mat_rd <- matrix(runif(4*5),4,5)
mat_rd
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.6696282 0.07153775 0.05746437 0.4326588 0.1453250
## [2,] 0.2238805 0.86741404 0.97930694 0.1879463 0.7482478
## [3,] 0.3727053 0.66634864 0.18959797 0.5517893 0.7755731
## [4,] 0.5109259 0.63471833 0.42242279 0.2897789 0.6480215# R
mat0 = matrix(0,2,3)
mat1 = matrix(1,2,3)
mat01 = rbind(mat0,mat1)
mat01
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 1 1 1
## [4,] 1 1 1
cbind(mat0,mat1)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 1 1 1
## [2,] 0 0 0 1 1 1
# horizontal
asplit(mat01,2)
## [[1]]
## [1] 0 0 1 1
##
## [[2]]
## [1] 0 0 1 1
##
## [[3]]
## [1] 0 0 1 1
#vertical
asplit(mat01,1)
## [[1]]
## [1] 0 0 0
##
## [[2]]
## [1] 0 0 0
##
## [[3]]
## [1] 1 1 1
##
## [[4]]
## [1] 1 1 1# R
arr <- runif(10)
arr
## [1] 0.6475949 0.6746447 0.6815953 0.4588817 0.3197958 0.4068844 0.7710430
## [8] 0.9336547 0.6939622 0.8531313
sort(arr)
## [1] 0.3197958 0.4068844 0.4588817 0.6475949 0.6746447 0.6815953 0.6939622
## [8] 0.7710430 0.8531313 0.9336547# R
arr <- matrix(runif(4*3),4,3)
arr
## [,1] [,2] [,3]
## [1,] 0.3160677 0.3019781 0.1507181
## [2,] 0.7567486 0.3147556 0.7988745
## [3,] 0.2330486 0.4533393 0.1934973
## [4,] 0.3121988 0.9166245 0.6655837
apply(arr,MARGIN = 2, FUN = sort)
## [,1] [,2] [,3]
## [1,] 0.2330486 0.3019781 0.1507181
## [2,] 0.3121988 0.3147556 0.1934973
## [3,] 0.3160677 0.4533393 0.6655837
## [4,] 0.7567486 0.9166245 0.7988745Imagine you have two square matrices representing interactions between entities. Each matrix represents a basic network, with a value of 1 indicating interaction and 0 indicating no interaction. You want to determine if two entities that are linked in one network are also linked in another network. Instead of checking each entity’s interaction sequentially in a loop, you can use matrix operations to perform element-wise calculations and get the same result in a more efficient manner.
# R
#generate a sequence
list_1 <- list(c(0,1,0,1),c(1,0,1,0),c(0,1,0,1),c(1,0,1,0))
network_1 <- do.call(rbind,list_1)
#generate a sequence
list_2 = list(c(0,1,0,0),c(1,0,0,0),c(0,0,0,1),c(0,0,1,0))
network_2 <- do.call(rbind,list_2)
network_12 = network_1*network_2
network_12
## [,1] [,2] [,3] [,4]
## [1,] 0 1 0 0
## [2,] 1 0 0 0
## [3,] 0 0 0 1
## [4,] 0 0 1 0One thing that is commonly needed is to modify specific values in an
array that meet a certain condition. The first step is to identify which
cells meet the condition. In R, the which function is
used.
It is important to note that the execution time can change depending on the size of the vector.
# R
vec <- seq(1,200000)
t <- Sys.time()
results <- rep(0,length(vec))
for(i in 1:length(vec)){
if(vec[i]%%2!=0){
results[i] = 0
} else {
results[i] = vec[i]
}
}
Sys.time() - t
## Time difference of 0.01962686 secs
t <- Sys.time()
results <- ifelse(vec%%2!=0,vec,0)
Sys.time() - t
## Time difference of 0.003929853 secs# R
vec <- seq(1,20000000)
t <- Sys.time()
results <- rep(0,length(vec))
for(i in 1:length(vec)){
if(vec[i]%%2!=0){
results[i] = 0
} else {
results[i] = vec[i]
}
}
Sys.time() - t
## Time difference of 1.743016 secs
t <- Sys.time()
results <- ifelse(vec%%2!=0,vec,0)
Sys.time() - t
## Time difference of 0.4276791 secs
## other way using which
t <- Sys.time()
vec[which(vec%%2==0)] = 0
Sys.time() - t
## Time difference of 0.1182339 secs# R
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Fast version of ifelse()
x <- c(-3:3, NA)
if_else(condition = x < 0,
true = "neg",
false = "pos",
missing = "NA")
## [1] "neg" "neg" "neg" "pos" "pos" "pos" "pos" "NA"
# Vectorised ifelse statements
x <- 1:10
case_when(
x %% 6 == 0 ~ "fizz buzz",
x %% 2 == 0 ~ "fizz",
x %% 3 == 0 ~ "buzz",
TRUE ~ as.character(x)
)
## [1] "1" "fizz" "buzz" "fizz" "5" "fizz buzz"
## [7] "7" "fizz" "buzz" "fizz"# R
X <- do.call(rbind,list(c(0,1,5,1),c(2,1,3,1),c(2,1,9,6),c(7,2,1,0),c(8,3,5,5)))
Y <- c(0,1,2,5,4)
t(X)%*%X
## [,1] [,2] [,3] [,4]
## [1,] 121 42 71 54
## [2,] 42 16 34 23
## [3,] 71 34 141 87
## [4,] 54 23 87 63
XtX <- t(X)%*%X
# inverse Matrix
solve(XtX)
## [,1] [,2] [,3] [,4]
## [1,] 0.23199214 -0.7142480 0.1163677 -0.09879148
## [2,] -0.71424803 2.3348120 -0.3728419 0.27469787
## [3,] 0.11636773 -0.3728419 0.1078793 -0.11260311
## [4,] -0.09879148 0.2746979 -0.1126031 0.15576444| Operator | Description |
|---|---|
| + | Addition (e.g., 1 + 1 = 2) |
| - | Subtraction (e.g., 3 - 2 = 1) |
| - | Unary negation (e.g., -2) |
| * | Multiplication (e.g., 2 * 3 = 6) |
| / | Division (e.g., 3 / 2 = 1.5) |
| %/% | Floor division (e.g., 3 %/% 2 = 1) |
| ^ | Exponentiation (e.g., 2 ^ 3 = 8) |
| %% | Modulus/remainder (e.g., 9 %% 4 = 1) |
# R
diag(network_12) = NA
network_12
## [,1] [,2] [,3] [,4]
## [1,] NA 1 0 0
## [2,] 1 NA 0 0
## [3,] 0 0 NA 1
## [4,] 0 0 1 NA
# nb of link
sum(network_12, na.rm = T)
## [1] 4
# same since there is only 0 and 1
sum(network_12[which(network_12>0,arr.ind = T)]) # see the arr.ind to get the two coordinates
## [1] 4
# share of link
mean(network_12, na.rm = T)
## [1] 0.3333333
# nb link by entities
apply(network_12,MARGIN = 2,FUN = sum, na.rm = T)
## [1] 1 1 1 1
# share of link across entities (by columns)
apply(network_12,MARGIN = 2,FUN = mean, na.rm = T)
## [1] 0.3333333 0.3333333 0.3333333 0.3333333The apply() family of functions allows for the manipulation of slices of data from matrices, arrays, lists, and dataframes in a repetitive manner, without the need for explicit use of loop constructs. These functions act on an input list, matrix or array and apply a function with one or several optional arguments.
apply takes a matrix as input, transform it by row or by columns and returns a matrix
lapply takes a list as input, transform it and returns a list.
# R
list_1 = list(mat_1,seq(1,8,0.5))
list_1
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
##
## [[2]]
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
# by row
lapply(list_1,FUN = function(x){x**2})
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 25 81 169
## [2,] 4 36 100 196
## [3,] 9 49 121 225
## [4,] 16 64 144 256
##
## [[2]]
## [1] 1.00 2.25 4.00 6.25 9.00 12.25 16.00 20.25 25.00 30.25 36.00 42.25
## [13] 49.00 56.25 64.00
# by columns
lapply(list_1,FUN = sum)
## [[1]]
## [1] 136
##
## [[2]]
## [1] 67.5sapply takes a list as input, transform it and returns a matrix.
# R
list_1 = list(mat_1,seq(1,8,length.out = length(mat_1)))
list_1
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
##
## [[2]]
## [1] 1.000000 1.466667 1.933333 2.400000 2.866667 3.333333 3.800000 4.266667
## [9] 4.733333 5.200000 5.666667 6.133333 6.600000 7.066667 7.533333 8.000000
# by row
sapply(list_1,FUN = function(x){x**2})
## [,1] [,2]
## [1,] 1 1.000000
## [2,] 4 2.151111
## [3,] 9 3.737778
## [4,] 16 5.760000
## [5,] 25 8.217778
## [6,] 36 11.111111
## [7,] 49 14.440000
## [8,] 64 18.204444
## [9,] 81 22.404444
## [10,] 100 27.040000
## [11,] 121 32.111111
## [12,] 144 37.617778
## [13,] 169 43.560000
## [14,] 196 49.937778
## [15,] 225 56.751111
## [16,] 256 64.000000
# by columns
sapply(list_1,FUN = sum)
## [1] 136 72mapply is used for ‘multivariate’ apply, which is used to vectorize arguments to a function that does not typically accept vector arguments. Depending on the size of the outputs, it will return a matrix or a list.
# R
# returns a matrix (all length.out = 5)
mapply(FUN = function(x,y,z){seq(x,y,length.out = z)},1,1:5,5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1.00 1.0 1.00 1
## [2,] 1 1.25 1.5 1.75 2
## [3,] 1 1.50 2.0 2.50 3
## [4,] 1 1.75 2.5 3.25 4
## [5,] 1 2.00 3.0 4.00 5
# returns a list (length.out goes from 1 to 5)
mapply(FUN = function(x,y,z){seq(x,y,length.out = z)},1,1:5,1:5)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] 1 2 3
##
## [[4]]
## [1] 1 2 3 4
##
## [[5]]
## [1] 1 2 3 4 5
Exercise 1
The series, \(1^{1} + 2^{2} + 3^{3} + ... + 10^{10} = 10405071317\).
Find the last ten digits of the series, \(1^{1} + 2^{2} + 3^{3} + ... + 1000^{1000}\).
Exercise 2
Try to vectorize exercices 1 and 2 from chapter 1, you can also compare it with apply/map functions.
Exercise 3 A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.
Find the largest palindrome made from the product of two 3-digit numbers.