This chapter deepens the notions of arrays and vectors seen in the ‘Basics’ chapter. In python, numpy will be the main package to deal with arrays, while in R everything is contains in the base. The main advantage of using vectors is to vectorize operation, instead of using loops to run operation on each elements of an object, we can simply perform it on blocks of data in a more efficient way.
let’s first see an example to understand the benefit of using vectorized operation :
# Python
import numpy as np
import time
arr = np.arange(2000000)
lst = list(range(2000000))
t = time.time()
arr_2 = arr*2
time.time() - t
## 0.006739139556884766
t = time.time()
lst_2 = [i*2 for i in lst]
time.time() - t
## 0.021797895431518555
# R
vec = c(1:2000000)
t = Sys.time()
vec_2 = vec*2
Sys.time() - t
## Time difference of 0.00257206 secs
t = Sys.time()
vec_2_loop <- rep(0,length(vec))
for(i in 1:length(vec)){
vec_2_loop[i] = vec[i]*2
}
Sys.time() - t
## Time difference of 0.1197219 secs
The NumPy library contains multidimensional array and matrix data
structures, it provides methods to efficiently operate on it. If you do
not have numpy on you computer, open a terminal and run
pip install numpy.
To use numpy one need to load the module by running
import numpy as np, you can just import numpy but then you
will need to write numpy.something to perform calculation. In order to
keep code standardized we simply say that we will refers to numpy using
np.
Numpy allows to perform fast and efficient calculation. The main difference between Numpy and python list is that all elements in a numpy array must be homogenous, Numpy uses less memory to store data.
| Functions | Tasks |
|---|---|
| array | Create numpy array |
| ndim | Dimension of the array |
| shape | Size of the array (Number of rows and Columns) |
| size | Total number of elements in the array |
| dtype | Type of elements in the array, i.e., int64, character |
| reshape | Reshapes the array without changing the original shape |
| resize | Reshapes the array. Also change the original shape |
| arange | Create sequence of numbers in array |
| Itemsize | Size in bytes of each item |
| diag | Create a diagonal matrix |
| vstack | Stacking vertically |
| hstack | Stacking horizontally |
# Python
import numpy as np
arr = np.array([1, 2, 3, 4])
arr
## array([1, 2, 3, 4])
# R
arr = matrix(c(1, 2, 3, 4), 4, 1)
arr
## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
# Python
np.zeros([2, 3])
## array([[0., 0., 0.],
## [0., 0., 0.]])
np.ones((2, 3))
## array([[1., 1., 1.],
## [1., 1., 1.]])
np.arange(1, 7)
## array([1, 2, 3, 4, 5, 6])
np.arange(1, 7).reshape(2, 3)
## array([[1, 2, 3],
## [4, 5, 6]])
## remember from first chapter that array in R and Python aren't store the same way, set order = 'F' to get the same results
np.arange(1, 7).reshape([2, 3], order = 'F')
## array([[1, 3, 5],
## [2, 4, 6]])
np.linspace(1, 4,num = 10)
## array([1. , 1.33333333, 1.66666667, 2. , 2.33333333,
## 2.66666667, 3. , 3.33333333, 3.66666667, 4. ])
# R
matrix(rep(0,2*3), 2, 3)
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
matrix(rep(1,2*3), 2, 3)
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
seq(1,6)
## [1] 1 2 3 4 5 6
matrix(seq(1,6),2,3)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
seq(1, 4, length.out = 10)
## [1] 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000 3.333333
## [9] 3.666667 4.000000
# Python
#generate an array
arr_rd = np.random.randn(4,5)
arr_rd
## array([[-1.33952175e+00, 1.22071591e+00, 5.76721935e-01,
## -6.26253351e-01, -2.61185704e-03],
## [ 7.05617478e-01, 1.13717427e-01, 1.74653248e-01,
## -6.82634913e-01, -2.68689462e-01],
## [ 1.12199458e+00, 3.49450329e-01, 2.66535086e-01,
## 2.50150232e-01, -2.58821150e-01],
## [-5.52368746e-01, 1.05893184e-01, -9.28211938e-01,
## -2.68808278e+00, 4.82327953e-01]])
# R
#generate a matrix
mat_rd <- matrix(rnorm(4*5),4,5)
mat_rd
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.2843290 -1.9360957 -1.73290934 -0.7167748 -1.81599807
## [2,] 1.2245251 2.6357024 -2.30157759 -0.7038928 0.74748374
## [3,] 0.3941497 1.3683239 -0.20105003 -0.1347477 -0.04645494
## [4,] -2.2583160 0.5984619 -0.01868813 -0.8490810 -1.03206005
# Python
arr = np.arange(1,7).reshape(2,3)
arr[0]
## array([1, 2, 3])
arr[:2]
## array([[1, 2, 3],
## [4, 5, 6]])
arr[1:]
## array([[4, 5, 6]])
# R
mat <- matrix(1:6,2,3,byrow = T)
mat[1,]
## [1] 1 2 3
mat[1:2,]
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
mat[2:dim(mat)[1],]
## [1] 4 5 6
# Python
# .ndim gives the number of axes, or dimensions, of the array.
arr_rd.ndim
## 2
# Python
# .size gives the total number of elements of the array.
arr_rd.size
## 20
# Python
# .shape display a tuple of integers with the number of elements stored along each dimension of the array
arr_rd.shape
## (4, 5)
# R
# length gives the total number of elements of the array.
length(mat_rd)
## [1] 20
#R
# dim display a vector of integers with the number of elements stored along each dimension of the array
dim(mat_rd)
## [1] 4 5
# Python
arr
## array([[1, 2, 3],
## [4, 5, 6]])
# /!\ using append it convert the array to a 1d array
arr_1d = np.append(arr, [7, 8, 9])
arr_1d
## array([1, 2, 3, 4, 5, 6, 7, 8, 9])
# using insert you can add rows and columns
np.insert(arr, len(arr), [7, 8, 9], axis = 0)
## array([[1, 2, 3],
## [4, 5, 6],
## [7, 8, 9]])
np.insert(arr, 2, [7, 8], axis = 1)
## array([[1, 2, 7, 3],
## [4, 5, 8, 6]])
# R
mat
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
rbind(mat,7:9)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
cbind(mat,7:8)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 7
## [2,] 4 5 6 8
# Python
arr0 = np.zeros([2,3])
arr1 = np.ones([2,3])
arr01 = np.vstack([arr0,arr1])
arr01
## array([[0., 0., 0.],
## [0., 0., 0.],
## [1., 1., 1.],
## [1., 1., 1.]])
#or
np.concatenate([arr0,arr1], axis = 0)
## array([[0., 0., 0.],
## [0., 0., 0.],
## [1., 1., 1.],
## [1., 1., 1.]])
np.hstack([arr0,arr1])
## array([[0., 0., 0., 1., 1., 1.],
## [0., 0., 0., 1., 1., 1.]])
#or
np.concatenate([arr0,arr1], axis = 1)
## array([[0., 0., 0., 1., 1., 1.],
## [0., 0., 0., 1., 1., 1.]])
np.hsplit(arr01,3)
## [array([[0.],
## [0.],
## [1.],
## [1.]]), array([[0.],
## [0.],
## [1.],
## [1.]]), array([[0.],
## [0.],
## [1.],
## [1.]])]
np.vsplit(arr01,4)
## [array([[0., 0., 0.]]), array([[0., 0., 0.]]), array([[1., 1., 1.]]), array([[1., 1., 1.]])]
# R
mat0 = matrix(0,2,3)
mat1 = matrix(1,2,3)
mat01 = rbind(mat0,mat1)
mat01
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 1 1 1
## [4,] 1 1 1
cbind(mat0,mat1)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 1 1 1
## [2,] 0 0 0 1 1 1
# horizontal
asplit(mat01,2)
## [[1]]
## [1] 0 0 1 1
##
## [[2]]
## [1] 0 0 1 1
##
## [[3]]
## [1] 0 0 1 1
#vertical
asplit(mat01,1)
## [[1]]
## [1] 0 0 0
##
## [[2]]
## [1] 0 0 0
##
## [[3]]
## [1] 1 1 1
##
## [[4]]
## [1] 1 1 1
# Python
np.delete(arr,1 , axis = 1)
## array([[1, 3],
## [4, 6]])
np.delete(arr,1 , axis = 0)
## array([[1, 2, 3]])
# R
mat[,-2]
## [,1] [,2]
## [1,] 1 3
## [2,] 4 6
mat[-2,]
## [1] 1 2 3
# Python
arr = np.random.randn(10)
arr
## array([ 0.55362894, -0.92812522, -1.06856489, 0.99160095, 0.88314684,
## -1.47997287, 0.15545915, -1.50116766, 0.48026097, 0.61767938])
arr.sort()
# Python
arr = np.random.randn(4,3)
arr
## array([[-0.48539526, -0.06614056, 0.78217971],
## [-0.0455197 , -0.73038539, 1.63142109],
## [-0.96818455, 1.92365276, 2.28668514],
## [-0.15303005, 0.98473509, -0.47546953]])
arr.sort(1)
# R
arr <- rnorm(10)
arr
## [1] 0.8549672 -0.4093803 0.7205933 0.9980313 0.9626051 -0.5713735
## [7] -0.2263444 -0.6893314 -0.7362330 -0.6039860
sort(arr)
## [1] -0.7362330 -0.6893314 -0.6039860 -0.5713735 -0.4093803 -0.2263444
## [7] 0.7205933 0.8549672 0.9626051 0.9980313
# R
arr <- matrix(rnorm(4*3),4,3)
arr
## [,1] [,2] [,3]
## [1,] -0.63021358 -1.5681169 -0.3527171
## [2,] 0.06050562 1.9075538 0.5152889
## [3,] -0.07583089 -0.8443039 -0.4779911
## [4,] 0.88827136 -0.5283868 2.0469722
apply(arr,MARGIN = 2, FUN = sort)
## [,1] [,2] [,3]
## [1,] -0.63021358 -1.5681169 -0.4779911
## [2,] -0.07583089 -0.8443039 -0.3527171
## [3,] 0.06050562 -0.5283868 0.5152889
## [4,] 0.88827136 1.9075538 2.0469722
Imagine two square matrices representing interaction between entities. Each matrix is a basic network: having interaction = 1, no interaction = 0. You want to know if two entities that are linked in one network are also linked in an other network. Instead of checking each entities interaction sequentially in a loop, one can simply use matrix to do element-wise calculation and get the same answer.
# Python
#generate a list
list_1 = [[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]
network_1 = np.array(list_1)
#generate a sequence
list_2 = [[0,1,0,0],[1,0,0,0],[0,0,0,1],[0,0,1,0]]
network_2 = np.array(list_2)
network_12 = network_1*network_2
network_12
## array([[0, 1, 0, 0],
## [1, 0, 0, 0],
## [0, 0, 0, 1],
## [0, 0, 1, 0]])
# R
#generate a sequence
list_1 <- list(c(0,1,0,1),c(1,0,1,0),c(0,1,0,1),c(1,0,1,0))
network_1 <- do.call(rbind,list_1)
#generate a sequence
list_2 = list(c(0,1,0,0),c(1,0,0,0),c(0,0,0,1),c(0,0,1,0))
network_2 <- do.call(rbind,list_2)
network_12 = network_1*network_2
network_12
## [,1] [,2] [,3] [,4]
## [1,] 0 1 0 0
## [2,] 1 0 0 0
## [3,] 0 0 0 1
## [4,] 0 0 1 0
# Python
network_12 > 0
## array([[False, True, False, False],
## [ True, False, False, False],
## [False, False, False, True],
## [False, False, True, False]])
network_12[network_12>0]
## array([1, 1, 1, 1])
# R
network_12 > 0
## [,1] [,2] [,3] [,4]
## [1,] FALSE TRUE FALSE FALSE
## [2,] TRUE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE TRUE
## [4,] FALSE FALSE TRUE FALSE
network_12[network_12>0]
## [1] 1 1 1 1
On thing that we often need is to modify specific value in a array that are satisfiying a condition. The first thing we want to know is which cells is satisfying the condition. In Python np.where is very common while in R we use the which function.
See how the execution time is changing depending on the size of the vector
# Python
arr = np.arange(200000)
t = time.time()
results = [(i if i%2==0 else 0) for i in arr]
time.time() - t
## 0.009582757949829102
t = time.time()
results = np.where(arr%2!=0,0,arr)
time.time() - t
## 0.0020558834075927734
# Python
arr = np.arange(20000000)
t = time.time()
results = [(i if i%2==0 else 0) for i in arr]
time.time() - t
## 0.841174840927124
t = time.time()
results = np.where(arr%2!=0,0,arr)
time.time() - t
## 0.09565281867980957
# R
vec <- seq(1,200000)
t <- Sys.time()
results <- rep(0,length(vec))
for(i in 1:length(vec)){
if(vec[i]%%2!=0){
results[i] = 0
} else {
results[i] = vec[i]
}
}
Sys.time() - t
## Time difference of 0.02257609 secs
t <- Sys.time()
results <- ifelse(vec%%2!=0,vec,0)
Sys.time() - t
## Time difference of 0.00338006 secs
# R
vec <- seq(1,20000000)
t <- Sys.time()
results <- rep(0,length(vec))
for(i in 1:length(vec)){
if(vec[i]%%2!=0){
results[i] = 0
} else {
results[i] = vec[i]
}
}
Sys.time() - t
## Time difference of 1.775931 secs
t <- Sys.time()
results <- ifelse(vec%%2!=0,vec,0)
Sys.time() - t
## Time difference of 0.4151349 secs
## other way using which
t <- Sys.time()
vec[which(vec%%2==0)] = 0
Sys.time() - t
## Time difference of 0.172961 secs
# R
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Fast version of ifelse()
x <- c(-3:3, NA)
if_else(condition = x < 0,
true = "neg",
false = "pos",
missing = "NA")
## [1] "neg" "neg" "neg" "pos" "pos" "pos" "pos" "NA"
# Vectorised ifelse statements
x <- 1:10
case_when(
x %% 6 == 0 ~ "fizz buzz",
x %% 2 == 0 ~ "fizz",
x %% 3 == 0 ~ "buzz",
TRUE ~ as.character(x)
)
## [1] "1" "fizz" "buzz" "fizz" "5" "fizz buzz"
## [7] "7" "fizz" "buzz" "fizz"
# Python
from numpy.linalg import inv, qr
X = np.array([[0,1,5,1],[2,1,3,1],[2,1,9,6],[7,2,1,0],[8,3,5,5]])
Y = np.array([0,1,2,5,4])
X
## array([[0, 1, 5, 1],
## [2, 1, 3, 1],
## [2, 1, 9, 6],
## [7, 2, 1, 0],
## [8, 3, 5, 5]])
X.T.dot(X)
## array([[121, 42, 71, 54],
## [ 42, 16, 34, 23],
## [ 71, 34, 141, 87],
## [ 54, 23, 87, 63]])
# same as
XtX = np.dot(X.T,X)
# same as
X.T @ X
## array([[121, 42, 71, 54],
## [ 42, 16, 34, 23],
## [ 71, 34, 141, 87],
## [ 54, 23, 87, 63]])
# inverse Matrix
inv(XtX)
## array([[ 0.23199214, -0.71424803, 0.11636773, -0.09879148],
## [-0.71424803, 2.33481201, -0.37284193, 0.27469787],
## [ 0.11636773, -0.37284193, 0.10787934, -0.11260311],
## [-0.09879148, 0.27469787, -0.11260311, 0.15576444]])
# Identity matrix
np.diag(np.ones(3))
## array([[1., 0., 0.],
## [0., 1., 0.],
## [0., 0., 1.]])
# Python
XtY = np.dot(X.T,Y)
# OLS
Beta = inv(XtX).dot(XtY)
Beta
## array([ 1.17202187, -1.85550547, 0.42034337, -0.38384807])
# R
X <- do.call(rbind,list(c(0,1,5,1),c(2,1,3,1),c(2,1,9,6),c(7,2,1,0),c(8,3,5,5)))
Y <- c(0,1,2,5,4)
t(X)%*%X
## [,1] [,2] [,3] [,4]
## [1,] 121 42 71 54
## [2,] 42 16 34 23
## [3,] 71 34 141 87
## [4,] 54 23 87 63
XtX <- t(X)%*%X
# inverse Matrix
solve(XtX)
## [,1] [,2] [,3] [,4]
## [1,] 0.23199214 -0.7142480 0.1163677 -0.09879148
## [2,] -0.71424803 2.3348120 -0.3728419 0.27469787
## [3,] 0.11636773 -0.3728419 0.1078793 -0.11260311
## [4,] -0.09879148 0.2746979 -0.1126031 0.15576444
# Identity matrix
diag(3)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
# R
XtY = t(X)%*%Y
# OLS
Beta = solve(XtX) %*% XtY
Beta
## [,1]
## [1,] 1.1720219
## [2,] -1.8555055
## [3,] 0.4203434
## [4,] -0.3838481
| Operator | Equivalent | Description |
|---|---|---|
| + | np.add | Addition (e.g., 1 + 1 = 2) |
| - | np.subtract | Subtraction (e.g., 3 - 2 = 1) |
| - | np.negative | Unary negation (e.g., -2) |
| * | np.multiply | Multiplication (e.g., 2 * 3 = 6) |
| / | np.divide | Division (e.g., 3 / 2 = 1.5) |
| // | np.floor_divide | Floor division (e.g., 3 // 2 = 1) |
| ** | np.power | Exponentiation (e.g., 2 ** 3 = 8) |
| % | np.mod | Modulus/remainder (e.g., 9 % 4 = 1) |
# Python
# nb of link
network_12.sum()/2
## np.float64(2.0)
# same since there is only 0 and 1
(network_12>0).sum()/2
## np.float64(2.0)
# share of link
network_12.mean()
## np.float64(0.25)
# nb link by entities
network_12.sum(axis=1)
## array([1, 1, 1, 1])
# share of link across entities (by columns)
network_12.mean(1)
## array([0.25, 0.25, 0.25, 0.25])
# R
# nb of link
sum(network_12)
## [1] 4
# same since there is only 0 and 1
sum(network_12[which(network_12>0,arr.ind = T)]) # see the arr.ind to get the two coordinates
## [1] 4
# share of link
mean(network_12)
## [1] 0.25
# nb link by entities
apply(network_12,MARGIN = 2,FUN = sum)
## [1] 1 1 1 1
# share of link across entities (by columns)
apply(network_12,MARGIN = 2,FUN = mean)
## [1] 0.25 0.25 0.25 0.25
The apply() family allows to manipulate slices of data from matrices, arrays, lists and dataframes in a repetitive way. It avoid explicit use of loop constructs. They act on an input list, matrix or array and apply a function with one or several optional arguments.
apply takes a matrix as input, transform it by row or by columns and returns a matrix
# R
mat_1 = matrix(1:(4*4),4,4)
# by row
apply(mat_1,MARGIN = 1,FUN = sum)
## [1] 28 32 36 40
# by columns
apply(mat_1,MARGIN = 2,FUN = function(x){x**2})
## [,1] [,2] [,3] [,4]
## [1,] 1 25 81 169
## [2,] 4 36 100 196
## [3,] 9 49 121 225
## [4,] 16 64 144 256
lapply takes a list as input, transform it and returns a list.
# R
list_1 = list(mat_1,seq(1,8,0.5))
list_1
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
##
## [[2]]
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
# by row
lapply(list_1,FUN = function(x){x**2})
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 25 81 169
## [2,] 4 36 100 196
## [3,] 9 49 121 225
## [4,] 16 64 144 256
##
## [[2]]
## [1] 1.00 2.25 4.00 6.25 9.00 12.25 16.00 20.25 25.00 30.25 36.00 42.25
## [13] 49.00 56.25 64.00
# by columns
lapply(list_1,FUN = sum)
## [[1]]
## [1] 136
##
## [[2]]
## [1] 67.5
sapply takes a list as input, transform it and returns a matrix.
# R
list_1 = list(mat_1,seq(1,8,length.out = length(mat_1)))
list_1
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
##
## [[2]]
## [1] 1.000000 1.466667 1.933333 2.400000 2.866667 3.333333 3.800000 4.266667
## [9] 4.733333 5.200000 5.666667 6.133333 6.600000 7.066667 7.533333 8.000000
# by row
sapply(list_1,FUN = function(x){x**2})
## [,1] [,2]
## [1,] 1 1.000000
## [2,] 4 2.151111
## [3,] 9 3.737778
## [4,] 16 5.760000
## [5,] 25 8.217778
## [6,] 36 11.111111
## [7,] 49 14.440000
## [8,] 64 18.204444
## [9,] 81 22.404444
## [10,] 100 27.040000
## [11,] 121 32.111111
## [12,] 144 37.617778
## [13,] 169 43.560000
## [14,] 196 49.937778
## [15,] 225 56.751111
## [16,] 256 64.000000
# by columns
sapply(list_1,FUN = sum)
## [1] 136 72
mapply is used for ‘multivariate’ apply. The main goal is to vectorize arguments to a function that is not usually accepting vectors as arguments. Depending on the size of the outputs it return a matrix or a list.
# R
# returns a matrix (all length.out = 5)
mapply(FUN = function(x,y,z){seq(x,y,length.out = z)},1,1:5,5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1.00 1.0 1.00 1
## [2,] 1 1.25 1.5 1.75 2
## [3,] 1 1.50 2.0 2.50 3
## [4,] 1 1.75 2.5 3.25 4
## [5,] 1 2.00 3.0 4.00 5
# returns a list (length.out goes from 1 to 5)
mapply(FUN = function(x,y,z){seq(x,y,length.out = z)},1,1:5,1:5)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] 1 2 3
##
## [[4]]
## [1] 1 2 3 4
##
## [[5]]
## [1] 1 2 3 4 5
In python we can use map to do the same thing than apply
familly in R with a bit of manipulation.
The function map takes different kind of input, it
transforms it by row if the object is an array and returns an
iterator.You can then transform this into a list or an array.
# Python
arr_1 = np.arange(1,17).reshape([4,4], order = 'F')
arr_1
## array([[ 1, 5, 9, 13],
## [ 2, 6, 10, 14],
## [ 3, 7, 11, 15],
## [ 4, 8, 12, 16]])
# by row, by columns you need to tranpose the array
map(np.sum,arr_1)
## <map object at 0x148296b30>
list(map(np.sum,arr_1))
## [np.int64(28), np.int64(32), np.int64(36), np.int64(40)]
np.fromiter(map(sum,arr_1), dtype = int)
## array([28, 32, 36, 40])
# with a list
lst_1 = [arr_1,np.linspace(1,8,num=arr_1.size)]
list(map(lambda x: x**2,lst_1))
## [array([[ 1, 25, 81, 169],
## [ 4, 36, 100, 196],
## [ 9, 49, 121, 225],
## [ 16, 64, 144, 256]]), array([ 1. , 2.15111111, 3.73777778, 5.76 , 8.21777778,
## 11.11111111, 14.44 , 18.20444444, 22.40444444, 27.04 ,
## 32.11111111, 37.61777778, 43.56 , 49.93777778, 56.75111111,
## 64. ])]
Using pandas one can also use apply functions, the difference with map is that by default it works by columns since variables in a df are store by colmuns.
# Python
import pandas as pd
arr_1 = np.arange(1,17).reshape([4,4], order = 'F')
#transform the array into a DataFrame
df_1 = pd.DataFrame(arr_1)
df_1
## 0 1 2 3
## 0 1 5 9 13
## 1 2 6 10 14
## 2 3 7 11 15
## 3 4 8 12 16
# Apply the function by column
df_1.apply(sum)
## 0 10
## 1 26
## 2 42
## 3 58
## dtype: int64
The series, \(1^{1} + 2^{2} + 3^{3} + ... + 10^{10} = 10405071317\).
Find the last ten digits of the series, \(1^{1} + 2^{2} + 3^{3} + ... + 1000^{1000}\).
Try to vectorize exercices 1 and 2 from chapter 1, you can also compare it with apply/map functions.
A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.
Find the largest palindrome made from the product of two 3-digit numbers.