Exercices

Laboratoires et exercices ISLR

Dans le manuel An introduction to statistical learning [JWHT13], vous pouvez faire les exercices suivants;

Laboratoires chap#3

  • 3.6.1 Libraries

  • 3.6.2 Simple Linear Regression

  • 3.6.3 Multiple Linear Regression

  • 3.6.4 Interaction Terms

  • 3.6.5 Non-linear Transformations of the Predictors

  • 3.6.6 Qualitative Predictors

  • 3.6.7 Writing Functions

Laboratoires chap#4

  • 4.7.1 The Stock Market Data

  • 4.7.2 Logistic Regression

  • 4.7.7 Poisson Regression

Exercices ISLR chap#3

1-3-8-12-13-14

Exercices ISLR chap#4

6-7-13(a-b-c-d)- 14(a-b-c-f)- 16 (juste la régression logistique)

Exercice 1

Soit les données suivantes

Period

y

x_(1)

x_(2)

1

1.3

6

4.5

2

1.5

7

4.6

3

1.8

7

4.5

4

1.6

8

4.7

5

1.7

8

4.6

En utilisant le modèle de regression suivant pour \(i=1,2, \dots 5\) ; $\( y_i=\beta_{0}+\beta_{1} x_{i1}+\beta_{2} x_{i2}+\varepsilon_{i} \)$

et \(\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1}\)

\[\begin{split}\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1}=\left(\begin{array}{ccc} 1522.73 & 26.87 & -374.67 \\ 26.87 & 0.93 & -7.33 \\ -374.67 & -7.33 & 93.33 \end{array}\right)\end{split}\]

Calculer \(\hat{\varepsilon}_{2}\);

Exercice 2

Vous ajuster le modèle \(y_i=\beta_{0}+\beta_{1} x_{i1}+\beta_{2} x_{i2}+\beta_{3} x_{i3}+\varepsilon_{i} \) sur les données suivantes:

\(y\)

\(x_1\)

\(x_2\)

\(x_3\)

8

1

1

0

7

0

0

1

6

1

0

0

8

1

1

1

9

0

0

0

3

0

1

1

Après avoir calcuer \(\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1}\), vous obtenez;

\[\begin{split}\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1}=\left(\begin{array}{ccc} 1522.73 & 26.87 & -374.67 \\ 26.87 & 0.93 & -7.33 \\ -374.67 & -7.33 & 93.33 \end{array}\right)\end{split}\]

Calculer \(\hat{\beta}_1\)

Solution 1

x_1 <- c(6,7,7,8,8)
x_2 <- c(4.5,4.6,4.5,4.7,4.6)
X <- cbind(Intercept = 1, x_1,x_2)
print(X)
     Intercept x_1 x_2
[1,]         1   6 4.5
[2,]         1   7 4.6
[3,]         1   7 4.5
[4,]         1   8 4.7
[5,]         1   8 4.6
y <- c(1.3,1.5,1.8,1.6,1.7)
sum(c(1,1,1,1,1)*y)
sum(x_1*y)
sum(x_2*y)
7.9
57.3
36.19
print(t(cbind(sum(c(1,1,1,1,1)*y),
      sum(x_1*y),
      sum(x_2*y))))
      [,1]
[1,]  7.90
[2,] 57.30
[3,] 36.19
print(t(X) %*% y)
           [,1]
Intercept  7.90
x_1       57.30
x_2       36.19
XX_1 <- matrix(c(1522.73, 26.87,-374.67,
                 26.87,0.93,-7.33,
                 -374.67,-7.33,93.33)
               ,nrow = 3, byrow =T)
print(XX_1)
        [,1]  [,2]    [,3]
[1,] 1522.73 26.87 -374.67
[2,]   26.87  0.93   -7.33
[3,] -374.67 -7.33   93.33
print(XX_1 %*% (t(X) %*% y))
        [,1]
[1,]  9.9107
[2,]  0.2893
[3,] -2.2893

Solution 2

y <- c(5,3,10,4,3,5)
x_1 <- c(0,1,0,1,0,1)
x_2 <- c(1,0,1,1,0,1)
x_3 <- c(0,1,1,0,0,0)

X <- cbind(Intercept = 1, x_1,x_2,x_3)
X
t(X) 
t(X)%*%y



XX_1 <- (1/30)*matrix(c(26,-10,-18,-12,
                       -10,20,0,0,
                       -18,0,24,6,
                       -12,0,6,24),
                     nrow = 4, byrow =T)
XX_1
beta_i <- XX_1%*% (t(X) %*% y)
beta_1 <- beta_i[2]
beta_1

print(XX_1%*% (t(X) %*% y))

donne=data.frame(cbind(y,X))
reg_model <- lm(y ~ x_1 + x_2 + x_3, data = donne)

reg_model$coefficients
A matrix: 6 × 4 of type dbl
Interceptx_1x_2x_3
1010
1101
1011
1110
1000
1110
A matrix: 4 × 6 of type dbl
Intercept111111
x_1010101
x_2101101
x_3011000
A matrix: 4 × 1 of type dbl
Intercept30
x_112
x_224
x_313
A matrix: 4 × 4 of type dbl
0.8666667-0.3333333-0.6-0.4
-0.3333333 0.6666667 0.0 0.0
-0.6000000 0.0000000 0.8 0.2
-0.4000000 0.0000000 0.2 0.8
-2
     [,1]
[1,]  2.4
[2,] -2.0
[3,]  3.8
[4,]  3.2
(Intercept)
2.4
x_1
-2
x_2
3.8
x_3
3.2

JWHT13

Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Volume 112. Springer, 2013.