{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Arrays, listes & data frames" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Les tableaux (arrays)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Les tableaux sont une généralisation des matrices. Le nombre de dimensions d'un tableau est égal à la longueur de l'attribut `dim`. Sa classe est \"array\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "x<-array(1:24, dim=c(3,4,2))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\item 5\n", "\\item 6\n", "\\item 7\n", "\\item 8\n", "\\item 9\n", "\\item 10\n", "\\item 11\n", "\\item 12\n", "\\item 13\n", "\\item 14\n", "\\item 15\n", "\\item 16\n", "\\item 17\n", "\\item 18\n", "\\item 19\n", "\\item 20\n", "\\item 21\n", "\\item 22\n", "\\item 23\n", "\\item 24\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "5. 5\n", "6. 6\n", "7. 7\n", "8. 8\n", "9. 9\n", "10. 10\n", "11. 11\n", "12. 12\n", "13. 13\n", "14. 14\n", "15. 15\n", "16. 16\n", "17. 17\n", "18. 18\n", "19. 19\n", "20. 20\n", "21. 21\n", "22. 22\n", "23. 23\n", "24. 24\n", "\n", "\n" ], "text/plain": [ ", , 1\n", "\n", " [,1] [,2] [,3] [,4]\n", "[1,] 1 4 7 10\n", "[2,] 2 5 8 11\n", "[3,] 3 6 9 12\n", "\n", ", , 2\n", "\n", " [,1] [,2] [,3] [,4]\n", "[1,] 13 16 19 22\n", "[2,] 14 17 20 23\n", "[3,] 15 18 21 24\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Les tableaux sont un cas spécial des matrices. Ils sont comme des vecteurs ou des matrices, à l'exception d'avoir des attributs additionnels." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Listes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Les listes sont un type de vecteur spécial qui peut être composé d'élément ayant n'importe laquelle classe (numérique, string ou booléen)." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\t
$taille
\n", "\t\t
    \n", "\t
  1. 1
  2. \n", "\t
  3. 5
  4. \n", "\t
  5. 2
  6. \n", "
\n", "
\n", "\t
$utilisateur
\n", "\t\t
'Mike'
\n", "\t
$new
\n", "\t\t
TRUE
\n", "
\n" ], "text/latex": [ "\\begin{description}\n", "\\item[\\$taille] \\begin{enumerate*}\n", "\\item 1\n", "\\item 5\n", "\\item 2\n", "\\end{enumerate*}\n", "\n", "\\item[\\$utilisateur] 'Mike'\n", "\\item[\\$new] TRUE\n", "\\end{description}\n" ], "text/markdown": [ "$taille\n", ": 1. 1\n", "2. 5\n", "3. 2\n", "\n", "\n", "\n", "$utilisateur\n", ": 'Mike'\n", "$new\n", ": TRUE\n", "\n", "\n" ], "text/plain": [ "$taille\n", "[1] 1 5 2\n", "\n", "$utilisateur\n", "[1] \"Mike\"\n", "\n", "$new\n", "[1] TRUE\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(x <- list(taille = c(1, 5, 2), utilisateur = \"Mike\", new = TRUE))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Puisque la liste un vecteur, on peut alors extraire avec les crochets `[]`" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "$taille =
    \n", "\t
  1. 1
  2. \n", "\t
  3. 5
  4. \n", "\t
  5. 2
  6. \n", "
\n" ], "text/latex": [ "\\textbf{\\$taille} = \\begin{enumerate*}\n", "\\item 1\n", "\\item 5\n", "\\item 2\n", "\\end{enumerate*}\n" ], "text/markdown": [ "**$taille** = 1. 1\n", "2. 5\n", "3. 2\n", "\n", "\n" ], "text/plain": [ "$taille\n", "[1] 1 5 2\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x[1]" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 1
  2. \n", "\t
  3. 5
  4. \n", "\t
  5. 2
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 5\n", "\\item 2\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 5\n", "3. 2\n", "\n", "\n" ], "text/plain": [ "[1] 1 5 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x[[1]]" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 1
  2. \n", "\t
  3. 5
  4. \n", "\t
  5. 2
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 5\n", "\\item 2\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 5\n", "3. 2\n", "\n", "\n" ], "text/plain": [ "[1] 1 5 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x$taille" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "'Mike'" ], "text/latex": [ "'Mike'" ], "text/markdown": [ "'Mike'" ], "text/plain": [ "[1] \"Mike\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x$utilisateur" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Data Frames" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Un _Data Frame_ est une **liste** de vecteurs de même longueur. Conceptuellement, c'est une matrice dont les lignes correspondent aux variables explicatives, et les lignes sont les valeurs mesurées de ces variables." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "villes <-c(\"Montréal\", \"Québec\", \"Laval\")\n", "Population <-c(1942044, 585485, 430077)\n", "village <-c(F,T,T)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "donnees_ville <-data.frame(villes, Population, village)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
villesPopulationvillage
Montréal1942044 FALSE
Québec 585485 TRUE
Laval 430077 TRUE
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " villes & Population & village\\\\\n", "\\hline\n", "\t Montréal & 1942044 & FALSE \\\\\n", "\t Québec & 585485 & TRUE \\\\\n", "\t Laval & 430077 & TRUE \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "villes | Population | village | \n", "|---|---|---|\n", "| Montréal | 1942044 | FALSE | \n", "| Québec | 585485 | TRUE | \n", "| Laval | 430077 | TRUE | \n", "\n", "\n" ], "text/plain": [ " villes Population village\n", "1 Montréal 1942044 FALSE \n", "2 Québec 585485 TRUE \n", "3 Laval 430077 TRUE " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "donnees_ville" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Si l’on vérifie les attributs de ce _data frame_" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\t
$names
\n", "\t\t
    \n", "\t
  1. 'villes'
  2. \n", "\t
  3. 'Population'
  4. \n", "\t
  5. 'village'
  6. \n", "
\n", "
\n", "\t
$row.names
\n", "\t\t
    \n", "\t
  1. 1
  2. \n", "\t
  3. 2
  4. \n", "\t
  5. 3
  6. \n", "
\n", "
\n", "\t
$class
\n", "\t\t
'data.frame'
\n", "
\n" ], "text/latex": [ "\\begin{description}\n", "\\item[\\$names] \\begin{enumerate*}\n", "\\item 'villes'\n", "\\item 'Population'\n", "\\item 'village'\n", "\\end{enumerate*}\n", "\n", "\\item[\\$row.names] \\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\end{enumerate*}\n", "\n", "\\item[\\$class] 'data.frame'\n", "\\end{description}\n" ], "text/markdown": [ "$names\n", ": 1. 'villes'\n", "2. 'Population'\n", "3. 'village'\n", "\n", "\n", "\n", "$row.names\n", ": 1. 1\n", "2. 2\n", "3. 3\n", "\n", "\n", "\n", "$class\n", ": 'data.frame'\n", "\n", "\n" ], "text/plain": [ "$names\n", "[1] \"villes\" \"Population\" \"village\" \n", "\n", "$row.names\n", "[1] 1 2 3\n", "\n", "$class\n", "[1] \"data.frame\"\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "attributes(donnees_ville)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "ça nous donne les étiquettes des colonnes, le nom de colonnes (numéros) et la classe" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. Montréal
  2. \n", "\t
  3. Québec
  4. \n", "\t
  5. Laval
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item Montréal\n", "\\item Québec\n", "\\item Laval\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. Montréal\n", "2. Québec\n", "3. Laval\n", "\n", "\n" ], "text/plain": [ "[1] Montréal Québec Laval \n", "Levels: Laval Montréal Québec" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ " donnees_ville[,1]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 1942044
  2. \n", "\t
  3. 585485
  4. \n", "\t
  5. 430077
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1942044\n", "\\item 585485\n", "\\item 430077\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1942044\n", "2. 585485\n", "3. 430077\n", "\n", "\n" ], "text/plain": [ "[1] 1942044 585485 430077" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ " donnees_ville[,2]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "Québec" ], "text/latex": [ "Québec" ], "text/markdown": [ "Québec" ], "text/plain": [ "[1] Québec\n", "Levels: Laval Montréal Québec" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ " donnees_ville[2,1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Ou par nom;" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 1942044
  2. \n", "\t
  3. 585485
  4. \n", "\t
  5. 430077
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1942044\n", "\\item 585485\n", "\\item 430077\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1942044\n", "2. 585485\n", "3. 430077\n", "\n", "\n" ], "text/plain": [ "[1] 1942044 585485 430077" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "donnees_ville$Population" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Remarquez que les villes ne s'affichent pas entre guillemets comme des strings, mais plutôt comme des `levels`, Si l’on voulait les avoir en strings, il faut ajouter l'argument `stringAsFactors=F`" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ " donnees_ville <-data.frame(villes, Population, village, stringsAsFactors=F)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
villesPopulationvillage
Montréal1942044 FALSE
Québec 585485 TRUE
Laval 430077 TRUE
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " villes & Population & village\\\\\n", "\\hline\n", "\t Montréal & 1942044 & FALSE \\\\\n", "\t Québec & 585485 & TRUE \\\\\n", "\t Laval & 430077 & TRUE \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "villes | Population | village | \n", "|---|---|---|\n", "| Montréal | 1942044 | FALSE | \n", "| Québec | 585485 | TRUE | \n", "| Laval | 430077 | TRUE | \n", "\n", "\n" ], "text/plain": [ " villes Population village\n", "1 Montréal 1942044 FALSE \n", "2 Québec 585485 TRUE \n", "3 Laval 430077 TRUE " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "donnees_ville" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'Montréal'
  2. \n", "\t
  3. 'Québec'
  4. \n", "\t
  5. 'Laval'
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'Montréal'\n", "\\item 'Québec'\n", "\\item 'Laval'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'Montréal'\n", "2. 'Québec'\n", "3. 'Laval'\n", "\n", "\n" ], "text/plain": [ "[1] \"Montréal\" \"Québec\" \"Laval\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ " donnees_ville$villes " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Une fonction très utile afin d'avoir un résumé sur les éléments du `df`" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'data.frame':\t3 obs. of 3 variables:\n", " $ villes : chr \"Montréal\" \"Québec\" \"Laval\"\n", " $ Population: num 1942044 585485 430077\n", " $ village : logi FALSE TRUE TRUE\n" ] } ], "source": [ "str(donnees_ville)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "df<-donnees_ville" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ " villes Population village \n", " Length:3 Min. : 430077 Mode :logical \n", " Class :character 1st Qu.: 507781 FALSE:1 \n", " Mode :character Median : 585485 TRUE :2 \n", " Mean : 985869 NA's :0 \n", " 3rd Qu.:1263764 \n", " Max. :1942044 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "summary(df)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On se rappelle des données `Cars93`, ce sont des données sous format df. Chargeons-les afin de travailler avec quelques exemples." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "require(MASS)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "data(Cars93)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "'data.frame'" ], "text/latex": [ "'data.frame'" ], "text/markdown": [ "'data.frame'" ], "text/plain": [ "[1] \"data.frame\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "class(Cars93)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'data.frame':\t93 obs. of 27 variables:\n", " $ Manufacturer : Factor w/ 32 levels \"Acura\",\"Audi\",..: 1 1 2 2 3 4 4 4 4 5 ...\n", " $ Model : Factor w/ 93 levels \"100\",\"190E\",\"240\",..: 49 56 9 1 6 24 54 74 73 35 ...\n", " $ Type : Factor w/ 6 levels \"Compact\",\"Large\",..: 4 3 1 3 3 3 2 2 3 2 ...\n", " $ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...\n", " $ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...\n", " $ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...\n", " $ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ...\n", " $ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ...\n", " $ AirBags : Factor w/ 3 levels \"Driver & Passenger\",..: 3 1 2 1 2 2 2 2 2 2 ...\n", " $ DriveTrain : Factor w/ 3 levels \"4WD\",\"Front\",..: 2 2 2 2 3 2 2 3 2 2 ...\n", " $ Cylinders : Factor w/ 6 levels \"3\",\"4\",\"5\",\"6\",..: 2 4 4 4 2 2 4 4 4 5 ...\n", " $ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...\n", " $ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ...\n", " $ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...\n", " $ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...\n", " $ Man.trans.avail : Factor w/ 2 levels \"No\",\"Yes\": 2 2 2 2 2 1 1 1 1 1 ...\n", " $ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...\n", " $ Passengers : int 5 5 5 6 4 6 6 6 5 6 ...\n", " $ Length : int 177 195 180 193 186 189 200 216 198 206 ...\n", " $ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ...\n", " $ Width : int 68 71 67 70 69 69 74 78 73 73 ...\n", " $ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ...\n", " $ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...\n", " $ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ...\n", " $ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...\n", " $ Origin : Factor w/ 2 levels \"USA\",\"non-USA\": 2 2 2 2 2 1 1 1 1 1 ...\n", " $ Make : Factor w/ 93 levels \"Acura Integra\",..: 1 2 4 3 5 6 7 9 8 10 ...\n" ] } ], "source": [ "str(Cars93)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "On voit que nous avons 93 observations avec 27 variables" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.1.2" }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }