This chapter delves deeper into the concepts of arrays and vectors that were introduced in the ‘Basics’ chapter. In R, everything is included in the base package. The main advantage of using vectors is that it allows for vectorized operations, rather than using loops to perform operations on each element of an object. Vectorized operations allow for more efficient processing of blocks of data.

Let’s first take a look at an example to understand the benefits of using vectorized operations:

# R

vec = c(1:2000000)

t = Sys.time()
vec_2 = vec*2
Sys.time() - t
## Time difference of 0.002454996 secs

t = Sys.time()
vec_2_loop <- rep(0,length(vec))
for(i in 1:length(vec)){
  vec_2_loop[i] = vec[i]*2
    }
Sys.time() - t
## Time difference of 0.1054912 secs

1 Create arrays

Starting from existing list or vector :

# R

arr = matrix(c(1, 2, 3, 4), 4, 1)
arr
##      [,1]
## [1,]    1
## [2,]    2
## [3,]    3
## [4,]    4

Create arrays with zero, one, sequence..

# R

matrix(rep(0,2*3), 2, 3)
##      [,1] [,2] [,3]
## [1,]    0    0    0
## [2,]    0    0    0

matrix(rep(1,2*3), 2, 3)
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1

seq(1,6)
## [1] 1 2 3 4 5 6

matrix(seq(1,6),2,3)
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

seq(1, 4, length.out = 10)
##  [1] 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000 3.333333
##  [9] 3.666667 4.000000

1.1 Random array

# R

#generate a matrix
mat_rd <- matrix(runif(4*5),4,5)

mat_rd
##           [,1]       [,2]       [,3]      [,4]      [,5]
## [1,] 0.6696282 0.07153775 0.05746437 0.4326588 0.1453250
## [2,] 0.2238805 0.86741404 0.97930694 0.1879463 0.7482478
## [3,] 0.3727053 0.66634864 0.18959797 0.5517893 0.7755731
## [4,] 0.5109259 0.63471833 0.42242279 0.2897789 0.6480215

1.2 Indexing and slicing

# R


mat <- matrix(1:6,2,3,byrow = T)
mat[1,]
## [1] 1 2 3
mat[1:2,]
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
mat[2:dim(mat)[1],]
## [1] 4 5 6

1.3 Shape and size

# R

# length gives the total number of elements of the array.
length(mat_rd)
## [1] 20

#R

# dim display a vector of integers with the number of elements stored along each dimension of the array
dim(mat_rd)
## [1] 4 5

2 Modifiying arrays

2.1 Add elements

# R
mat
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

rbind(mat,7:9)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

cbind(mat,7:8)
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    7
## [2,]    4    5    6    8

Combining/splitting arrays

# R

mat0 = matrix(0,2,3)
mat1 = matrix(1,2,3)

mat01 = rbind(mat0,mat1)
mat01
##      [,1] [,2] [,3]
## [1,]    0    0    0
## [2,]    0    0    0
## [3,]    1    1    1
## [4,]    1    1    1

cbind(mat0,mat1)
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    0    0    1    1    1
## [2,]    0    0    0    1    1    1

# horizontal
asplit(mat01,2)
## [[1]]
## [1] 0 0 1 1
## 
## [[2]]
## [1] 0 0 1 1
## 
## [[3]]
## [1] 0 0 1 1

#vertical
asplit(mat01,1)
## [[1]]
## [1] 0 0 0
## 
## [[2]]
## [1] 0 0 0
## 
## [[3]]
## [1] 1 1 1
## 
## [[4]]
## [1] 1 1 1

2.2 Delete elements

# R

mat[,-2]
##      [,1] [,2]
## [1,]    1    3
## [2,]    4    6

mat[-2,]
## [1] 1 2 3

2.3 Sorting

# R

arr <- runif(10)

arr
##  [1] 0.6475949 0.6746447 0.6815953 0.4588817 0.3197958 0.4068844 0.7710430
##  [8] 0.9336547 0.6939622 0.8531313
sort(arr)
##  [1] 0.3197958 0.4068844 0.4588817 0.6475949 0.6746447 0.6815953 0.6939622
##  [8] 0.7710430 0.8531313 0.9336547

# R

arr <- matrix(runif(4*3),4,3)

arr
##           [,1]      [,2]      [,3]
## [1,] 0.3160677 0.3019781 0.1507181
## [2,] 0.7567486 0.3147556 0.7988745
## [3,] 0.2330486 0.4533393 0.1934973
## [4,] 0.3121988 0.9166245 0.6655837
apply(arr,MARGIN = 2, FUN = sort)
##           [,1]      [,2]      [,3]
## [1,] 0.2330486 0.3019781 0.1507181
## [2,] 0.3121988 0.3147556 0.1934973
## [3,] 0.3160677 0.4533393 0.6655837
## [4,] 0.7567486 0.9166245 0.7988745

3 Conditional Logic on array

Imagine you have two square matrices representing interactions between entities. Each matrix represents a basic network, with a value of 1 indicating interaction and 0 indicating no interaction. You want to determine if two entities that are linked in one network are also linked in another network. Instead of checking each entity’s interaction sequentially in a loop, you can use matrix operations to perform element-wise calculations and get the same result in a more efficient manner.

# R

#generate a sequence
list_1 <- list(c(0,1,0,1),c(1,0,1,0),c(0,1,0,1),c(1,0,1,0))

network_1 <- do.call(rbind,list_1)

#generate a sequence
list_2 = list(c(0,1,0,0),c(1,0,0,0),c(0,0,0,1),c(0,0,1,0))

network_2 <- do.call(rbind,list_2)

network_12 = network_1*network_2
network_12
##      [,1] [,2] [,3] [,4]
## [1,]    0    1    0    0
## [2,]    1    0    0    0
## [3,]    0    0    0    1
## [4,]    0    0    1    0

3.1 Select specific elements

# R
 
network_12 > 0
##       [,1]  [,2]  [,3]  [,4]
## [1,] FALSE  TRUE FALSE FALSE
## [2,]  TRUE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE  TRUE
## [4,] FALSE FALSE  TRUE FALSE

network_12[network_12>0]
## [1] 1 1 1 1

One thing that is commonly needed is to modify specific values in an array that meet a certain condition. The first step is to identify which cells meet the condition. In R, the which function is used.

It is important to note that the execution time can change depending on the size of the vector.

# R

vec <- seq(1,200000)

t <- Sys.time()
results <- rep(0,length(vec))
for(i in 1:length(vec)){
  if(vec[i]%%2!=0){
    results[i] = 0
  } else {
    results[i] = vec[i]
  }
}
Sys.time() - t
## Time difference of 0.01962686 secs


t <- Sys.time()
results <- ifelse(vec%%2!=0,vec,0)
Sys.time() - t
## Time difference of 0.003929853 secs

# R

vec <- seq(1,20000000)

t <- Sys.time()
results <- rep(0,length(vec))
for(i in 1:length(vec)){
  if(vec[i]%%2!=0){
    results[i] = 0
  } else {
    results[i] = vec[i]
  }
}
Sys.time() - t
## Time difference of 1.743016 secs


t <- Sys.time()
results <- ifelse(vec%%2!=0,vec,0)
Sys.time() - t
## Time difference of 0.4276791 secs


## other way using which

t <- Sys.time()
vec[which(vec%%2==0)] = 0
Sys.time() - t 
## Time difference of 0.1182339 secs

3.2 Other conditional vectorization

# R
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Fast version of ifelse()
x <- c(-3:3, NA)
if_else(condition = x < 0,
        true      = "neg",
        false     = "pos",
        missing   = "NA")
## [1] "neg" "neg" "neg" "pos" "pos" "pos" "pos" "NA"


# Vectorised ifelse statements


x <- 1:10
case_when(
  x %% 6 == 0 ~ "fizz buzz",
  x %% 2 == 0 ~ "fizz",
  x %% 3 == 0 ~ "buzz",
  TRUE ~ as.character(x)
)
##  [1] "1"         "fizz"      "buzz"      "fizz"      "5"         "fizz buzz"
##  [7] "7"         "fizz"      "buzz"      "fizz"

4 Algebra

# R

 
X <- do.call(rbind,list(c(0,1,5,1),c(2,1,3,1),c(2,1,9,6),c(7,2,1,0),c(8,3,5,5))) 
Y <- c(0,1,2,5,4)
 
t(X)%*%X
##      [,1] [,2] [,3] [,4]
## [1,]  121   42   71   54
## [2,]   42   16   34   23
## [3,]   71   34  141   87
## [4,]   54   23   87   63

XtX <- t(X)%*%X

# inverse Matrix
solve(XtX)
##             [,1]       [,2]       [,3]        [,4]
## [1,]  0.23199214 -0.7142480  0.1163677 -0.09879148
## [2,] -0.71424803  2.3348120 -0.3728419  0.27469787
## [3,]  0.11636773 -0.3728419  0.1078793 -0.11260311
## [4,] -0.09879148  0.2746979 -0.1126031  0.15576444

# R


XtY = t(X)%*%Y

# OLS 
Beta = solve(XtX) %*% XtY
Beta
##            [,1]
## [1,]  1.1720219
## [2,] -1.8555055
## [3,]  0.4203434
## [4,] -0.3838481

5 Operation on array

Operator	Description
+	Addition (e.g., 1 + 1 = 2)
-	Subtraction (e.g., 3 - 2 = 1)
-	Unary negation (e.g., -2)
*	Multiplication (e.g., 2 * 3 = 6)
/	Division (e.g., 3 / 2 = 1.5)
%/%	Floor division (e.g., 3 %/% 2 = 1)
^	Exponentiation (e.g., 2 ^ 3 = 8)
%%	Modulus/remainder (e.g., 9 %% 4 = 1)

# R
diag(network_12) = NA
network_12
##      [,1] [,2] [,3] [,4]
## [1,]   NA    1    0    0
## [2,]    1   NA    0    0
## [3,]    0    0   NA    1
## [4,]    0    0    1   NA
# nb of link
sum(network_12, na.rm = T)
## [1] 4
# same since there is only 0 and 1 
sum(network_12[which(network_12>0,arr.ind = T)]) # see the arr.ind to get the two coordinates
## [1] 4

# share of link 
mean(network_12, na.rm = T)
## [1] 0.3333333


# nb link by entities
apply(network_12,MARGIN = 2,FUN = sum, na.rm = T)
## [1] 1 1 1 1

# share of link across entities (by columns)
apply(network_12,MARGIN = 2,FUN = mean, na.rm = T)
## [1] 0.3333333 0.3333333 0.3333333 0.3333333

5.1 Apply familly (R)

The apply() family of functions allows for the manipulation of slices of data from matrices, arrays, lists, and dataframes in a repetitive manner, without the need for explicit use of loop constructs. These functions act on an input list, matrix or array and apply a function with one or several optional arguments.

5.1.1 apply

apply takes a matrix as input, transform it by row or by columns and returns a matrix

# R

mat_1 = matrix(1:(4*4),4,4)

# by row
apply(mat_1,MARGIN = 1,FUN = sum)
## [1] 28 32 36 40

# by columns
apply(mat_1,MARGIN = 2,FUN = function(x){x**2})
##      [,1] [,2] [,3] [,4]
## [1,]    1   25   81  169
## [2,]    4   36  100  196
## [3,]    9   49  121  225
## [4,]   16   64  144  256

5.1.2 lapply

lapply takes a list as input, transform it and returns a list.

# R

list_1 = list(mat_1,seq(1,8,0.5))
list_1
## [[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
## 
## [[2]]
##  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

# by row
lapply(list_1,FUN = function(x){x**2})
## [[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1   25   81  169
## [2,]    4   36  100  196
## [3,]    9   49  121  225
## [4,]   16   64  144  256
## 
## [[2]]
##  [1]  1.00  2.25  4.00  6.25  9.00 12.25 16.00 20.25 25.00 30.25 36.00 42.25
## [13] 49.00 56.25 64.00

# by columns
lapply(list_1,FUN = sum)
## [[1]]
## [1] 136
## 
## [[2]]
## [1] 67.5

5.1.3 sapply

sapply takes a list as input, transform it and returns a matrix.

# R

list_1 = list(mat_1,seq(1,8,length.out = length(mat_1)))
list_1
## [[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
## 
## [[2]]
##  [1] 1.000000 1.466667 1.933333 2.400000 2.866667 3.333333 3.800000 4.266667
##  [9] 4.733333 5.200000 5.666667 6.133333 6.600000 7.066667 7.533333 8.000000

# by row
sapply(list_1,FUN = function(x){x**2})
##       [,1]      [,2]
##  [1,]    1  1.000000
##  [2,]    4  2.151111
##  [3,]    9  3.737778
##  [4,]   16  5.760000
##  [5,]   25  8.217778
##  [6,]   36 11.111111
##  [7,]   49 14.440000
##  [8,]   64 18.204444
##  [9,]   81 22.404444
## [10,]  100 27.040000
## [11,]  121 32.111111
## [12,]  144 37.617778
## [13,]  169 43.560000
## [14,]  196 49.937778
## [15,]  225 56.751111
## [16,]  256 64.000000

# by columns
sapply(list_1,FUN = sum)
## [1] 136  72

5.1.4 mapply

mapply is used for ‘multivariate’ apply, which is used to vectorize arguments to a function that does not typically accept vector arguments. Depending on the size of the outputs, it will return a matrix or a list.

# R

# returns a matrix (all length.out = 5)
mapply(FUN = function(x,y,z){seq(x,y,length.out = z)},1,1:5,5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1 1.00  1.0 1.00    1
## [2,]    1 1.25  1.5 1.75    2
## [3,]    1 1.50  2.0 2.50    3
## [4,]    1 1.75  2.5 3.25    4
## [5,]    1 2.00  3.0 4.00    5

# returns a list (length.out goes from 1 to 5)
mapply(FUN = function(x,y,z){seq(x,y,length.out = z)},1,1:5,1:5)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] 1 2 3
## 
## [[4]]
## [1] 1 2 3 4
## 
## [[5]]
## [1] 1 2 3 4 5

6 Exercises

Exercise 1

The series, \(1^{1} + 2^{2} + 3^{3} + ... + 10^{10} = 10405071317\).

Find the last ten digits of the series, \(1^{1} + 2^{2} + 3^{3} + ... + 1000^{1000}\).

Exercise 2

Try to vectorize exercices 1 and 2 from chapter 1, you can also compare it with apply/map functions.

Exercise 3 A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.

Find the largest palindrome made from the product of two 3-digit numbers.

Solutions

Arrays