{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Importation des données"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous avons vu dans le dernier cours les _data frames_, nous l'avions créée manuellement. Toutefois, nous allons souvent importer des données en pratique sous plusieurs formats. Ces fichiers que nous allons importer seront souvent dans des fichiers `.csv` (_Comma-separated values_). Ces fichiers sont très populaires et ils sont générés par Excel ."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## read.csv"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"options(repr.matrix.max.cols=8, repr.matrix.max.rows=8) #seulement pour afficher 8 lignes et 8 colonnes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut lire les fichiers `.csv` localement en précisant le chemin exacte menant vers du fichier en question."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"Segment | VitesseM | PuissanceEstim |
\n",
"\n",
"\tKm1 | 31.9 | 130 |
\n",
"\tKm2 | 33.3 | 165 |
\n",
"\tKm3 | 28.1 | 130 |
\n",
"\tKm4 | 30.8 | 133 |
\n",
"\tKm5 | 27.7 | 103 |
\n",
"\tKm6 | 31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Segment & VitesseM & PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1 & 31.9 & 130 \\\\\n",
"\t Km2 & 33.3 & 165 \\\\\n",
"\t Km3 & 28.1 & 130 \\\\\n",
"\t Km4 & 30.8 & 133 \\\\\n",
"\t Km5 & 27.7 & 103 \\\\\n",
"\t Km6 & 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment | VitesseM | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1 | 31.9 | 130 | \n",
"| Km2 | 33.3 | 165 | \n",
"| Km3 | 28.1 | 130 | \n",
"| Km4 | 30.8 | 133 | \n",
"| Km5 | 27.7 | 103 | \n",
"| Km6 | 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment VitesseM PuissanceEstim\n",
"1 Km1 31.9 130 \n",
"2 Km2 33.3 165 \n",
"3 Km3 28.1 130 \n",
"4 Km4 30.8 133 \n",
"5 Km5 27.7 103 \n",
"6 Km6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# test_csv <-read.csv(\"https://raw.githubusercontent.com/nmeraihi/data/master/exemple_1.csv\")\n",
"test_csv <-read.csv(\"exemple_1.csv\")\n",
"test_csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ou directement à partir du web:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Segment | VitesseM | PuissanceEstim |
\n",
"\n",
"\tKm1 | 31.9 | 130 |
\n",
"\tKm2 | 33.3 | 165 |
\n",
"\tKm3 | 28.1 | 130 |
\n",
"\tKm4 | 30.8 | 133 |
\n",
"\tKm5 | 27.7 | 103 |
\n",
"\tKm6 | 31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Segment & VitesseM & PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1 & 31.9 & 130 \\\\\n",
"\t Km2 & 33.3 & 165 \\\\\n",
"\t Km3 & 28.1 & 130 \\\\\n",
"\t Km4 & 30.8 & 133 \\\\\n",
"\t Km5 & 27.7 & 103 \\\\\n",
"\t Km6 & 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment | VitesseM | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1 | 31.9 | 130 | \n",
"| Km2 | 33.3 | 165 | \n",
"| Km3 | 28.1 | 130 | \n",
"| Km4 | 30.8 | 133 | \n",
"| Km5 | 27.7 | 103 | \n",
"| Km6 | 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment VitesseM PuissanceEstim\n",
"1 Km1 31.9 130 \n",
"2 Km2 33.3 165 \n",
"3 Km3 28.1 130 \n",
"4 Km4 30.8 133 \n",
"5 Km5 27.7 103 \n",
"6 Km6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"test_csv <-read.csv(\"https://raw.githubusercontent.com/nmeraihi/data/master/exemple_1.csv\")\n",
"test_csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lorsque nous écrivons `read.csv`, R traite importe ce fichier sous format `data frame`, il nous retourne les noms de colonnes, les lignes ainsi que la classe du df"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- $names
\n",
"\t\t\n",
"\t- 'Segment'
\n",
"\t- 'VitesseM'
\n",
"\t- 'PuissanceEstim'
\n",
"
\n",
" \n",
"\t- $class
\n",
"\t\t- 'data.frame'
\n",
"\t- $row.names
\n",
"\t\t\n",
"\t- 1
\n",
"\t- 2
\n",
"\t- 3
\n",
"\t- 4
\n",
"\t- 5
\n",
"\t- 6
\n",
"
\n",
" \n",
"
\n"
],
"text/latex": [
"\\begin{description}\n",
"\\item[\\$names] \\begin{enumerate*}\n",
"\\item 'Segment'\n",
"\\item 'VitesseM'\n",
"\\item 'PuissanceEstim'\n",
"\\end{enumerate*}\n",
"\n",
"\\item[\\$class] 'data.frame'\n",
"\\item[\\$row.names] \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 2\n",
"\\item 3\n",
"\\item 4\n",
"\\item 5\n",
"\\item 6\n",
"\\end{enumerate*}\n",
"\n",
"\\end{description}\n"
],
"text/markdown": [
"$names\n",
": 1. 'Segment'\n",
"2. 'VitesseM'\n",
"3. 'PuissanceEstim'\n",
"\n",
"\n",
"\n",
"$class\n",
": 'data.frame'\n",
"$row.names\n",
": 1. 1\n",
"2. 2\n",
"3. 3\n",
"4. 4\n",
"5. 5\n",
"6. 6\n",
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"$names\n",
"[1] \"Segment\" \"VitesseM\" \"PuissanceEstim\"\n",
"\n",
"$class\n",
"[1] \"data.frame\"\n",
"\n",
"$row.names\n",
"[1] 1 2 3 4 5 6\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"attributes(test_csv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dans la méthode `read.csv`, il existe un argument optionnel `_header_` qui est par défaut `header=T`. Cet argument spécifie si les données que nous voulons importer possèdent des noms de colonne (`header=TRUE` ~ `header=T`.) ou pas (`header=FALSE` ~ `header=F`.). Regardons ce que ça donnerait si nous changeons la valeur `header=F`; "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"V1 | V2 | V3 |
\n",
"\n",
"\tSegment | VitesseM | PuissanceEstim |
\n",
"\tKm1 | 31.9 | 130 |
\n",
"\tKm2 | 33.3 | 165 |
\n",
"\tKm3 | 28.1 | 130 |
\n",
"\tKm4 | 30.8 | 133 |
\n",
"\tKm5 | 27.7 | 103 |
\n",
"\tKm6 | 31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" V1 & V2 & V3\\\\\n",
"\\hline\n",
"\t Segment & VitesseM & PuissanceEstim\\\\\n",
"\t Km1 & 31.9 & 130 \\\\\n",
"\t Km2 & 33.3 & 165 \\\\\n",
"\t Km3 & 28.1 & 130 \\\\\n",
"\t Km4 & 30.8 & 133 \\\\\n",
"\t Km5 & 27.7 & 103 \\\\\n",
"\t Km6 & 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"V1 | V2 | V3 | \n",
"|---|---|---|---|---|---|---|\n",
"| Segment | VitesseM | PuissanceEstim | \n",
"| Km1 | 31.9 | 130 | \n",
"| Km2 | 33.3 | 165 | \n",
"| Km3 | 28.1 | 130 | \n",
"| Km4 | 30.8 | 133 | \n",
"| Km5 | 27.7 | 103 | \n",
"| Km6 | 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" V1 V2 V3 \n",
"1 Segment VitesseM PuissanceEstim\n",
"2 Km1 31.9 130 \n",
"3 Km2 33.3 165 \n",
"4 Km3 28.1 130 \n",
"5 Km4 30.8 133 \n",
"6 Km5 27.7 103 \n",
"7 Km6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"test_csv <-read.csv(\"exemple_1.csv\", header = F)\n",
"test_csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On remarque que R crée des noms de colonnes appelés `V1, V2...etc.`"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Segment | VitesseM | PuissanceEstim |
\n",
"\n",
"\tKm1 | 31.9 | 130 |
\n",
"\tKm2 | 33.3 | 165 |
\n",
"\tKm3 | 28.1 | 130 |
\n",
"\tKm4 | 30.8 | 133 |
\n",
"\tKm5 | 27.7 | 103 |
\n",
"\tKm6 | 31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Segment & VitesseM & PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1 & 31.9 & 130 \\\\\n",
"\t Km2 & 33.3 & 165 \\\\\n",
"\t Km3 & 28.1 & 130 \\\\\n",
"\t Km4 & 30.8 & 133 \\\\\n",
"\t Km5 & 27.7 & 103 \\\\\n",
"\t Km6 & 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment | VitesseM | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1 | 31.9 | 130 | \n",
"| Km2 | 33.3 | 165 | \n",
"| Km3 | 28.1 | 130 | \n",
"| Km4 | 30.8 | 133 | \n",
"| Km5 | 27.7 | 103 | \n",
"| Km6 | 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment VitesseM PuissanceEstim\n",
"1 Km1 31.9 130 \n",
"2 Km2 33.3 165 \n",
"3 Km3 28.1 130 \n",
"4 Km4 30.8 133 \n",
"5 Km5 27.7 103 \n",
"6 Km6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"exemple <-read.csv(\"exemple_1.csv\", header = T)\n",
"exemple"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Regardons la classe de la variable \"Segment\";"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'factor'"
],
"text/latex": [
"'factor'"
],
"text/markdown": [
"'factor'"
],
"text/plain": [
"[1] \"factor\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"class(exemple$Segment)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Surprise!**. En effet, il existe une autre option dans la méthode `read.csv` qui permet de traiter les catégories en type caractère."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Segment | VitesseM | PuissanceEstim |
\n",
"\n",
"\tKm1 | 31.9 | 130 |
\n",
"\tKm2 | 33.3 | 165 |
\n",
"\tKm3 | 28.1 | 130 |
\n",
"\tKm4 | 30.8 | 133 |
\n",
"\tKm5 | 27.7 | 103 |
\n",
"\tKm6 | 31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Segment & VitesseM & PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1 & 31.9 & 130 \\\\\n",
"\t Km2 & 33.3 & 165 \\\\\n",
"\t Km3 & 28.1 & 130 \\\\\n",
"\t Km4 & 30.8 & 133 \\\\\n",
"\t Km5 & 27.7 & 103 \\\\\n",
"\t Km6 & 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment | VitesseM | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1 | 31.9 | 130 | \n",
"| Km2 | 33.3 | 165 | \n",
"| Km3 | 28.1 | 130 | \n",
"| Km4 | 30.8 | 133 | \n",
"| Km5 | 27.7 | 103 | \n",
"| Km6 | 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment VitesseM PuissanceEstim\n",
"1 Km1 31.9 130 \n",
"2 Km2 33.3 165 \n",
"3 Km3 28.1 130 \n",
"4 Km4 30.8 133 \n",
"5 Km5 27.7 103 \n",
"6 Km6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"exemple <-read.csv(\"exemple_1.csv\", header = T, stringsAsFactors=F)\n",
"exemple"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Maintenant, regardons la classe de la variable \"Segment\""
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'character'"
],
"text/latex": [
"'character'"
],
"text/markdown": [
"'character'"
],
"text/plain": [
"[1] \"character\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"class(exemple$Segment)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Une partie seulement du df\n",
"\n",
"Il est possible de lire seulement certaines colonnes du fichier csv qu'on veut importer;"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"VitesseM | PuissanceEstim |
\n",
"\n",
"\t31.9 | 130 |
\n",
"\t33.3 | 165 |
\n",
"\t28.1 | 130 |
\n",
"\t30.8 | 133 |
\n",
"\t27.7 | 103 |
\n",
"\t31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" VitesseM & PuissanceEstim\\\\\n",
"\\hline\n",
"\t 31.9 & 130 \\\\\n",
"\t 33.3 & 165 \\\\\n",
"\t 28.1 & 130 \\\\\n",
"\t 30.8 & 133 \\\\\n",
"\t 27.7 & 103 \\\\\n",
"\t 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"VitesseM | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| 31.9 | 130 | \n",
"| 33.3 | 165 | \n",
"| 28.1 | 130 | \n",
"| 30.8 | 133 | \n",
"| 27.7 | 103 | \n",
"| 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" VitesseM PuissanceEstim\n",
"1 31.9 130 \n",
"2 33.3 165 \n",
"3 28.1 130 \n",
"4 30.8 133 \n",
"5 27.7 103 \n",
"6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"exemple <-read.csv(\"exemple_1.csv\", header = T, stringsAsFactors=F)[,2:3]\n",
"exemple"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Segment | PuissanceEstim |
\n",
"\n",
"\tKm1 | 130 |
\n",
"\tKm2 | 165 |
\n",
"\tKm3 | 130 |
\n",
"\tKm4 | 133 |
\n",
"\tKm5 | 103 |
\n",
"\tKm6 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" Segment & PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1 & 130\\\\\n",
"\t Km2 & 165\\\\\n",
"\t Km3 & 130\\\\\n",
"\t Km4 & 133\\\\\n",
"\t Km5 & 103\\\\\n",
"\t Km6 & 154\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1 | 130 | \n",
"| Km2 | 165 | \n",
"| Km3 | 130 | \n",
"| Km4 | 133 | \n",
"| Km5 | 103 | \n",
"| Km6 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment PuissanceEstim\n",
"1 Km1 130 \n",
"2 Km2 165 \n",
"3 Km3 130 \n",
"4 Km4 133 \n",
"5 Km5 103 \n",
"6 Km6 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"exemple <-read.csv(\"exemple_1.csv\", header = T, stringsAsFactors=F)[,c(1,3)]\n",
"exemple"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## read.table\n",
"\n",
"Une autre façon d'importer des données à l'intérieur des df est d'utiliser la méthode `read.table` qui traite les fichiers `text`;"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Segment.VitesseM.PuissanceEstim |
\n",
"\n",
"\tKm1,31.9,130 |
\n",
"\tKm2,33.3,165 |
\n",
"\tKm3,28.1,130 |
\n",
"\tKm4,30.8,133 |
\n",
"\tKm5,27.7,103 |
\n",
"\tKm6,31.2,154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|l}\n",
" Segment.VitesseM.PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1,31.9,130\\\\\n",
"\t Km2,33.3,165\\\\\n",
"\t Km3,28.1,130\\\\\n",
"\t Km4,30.8,133\\\\\n",
"\t Km5,27.7,103\\\\\n",
"\t Km6,31.2,154\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment.VitesseM.PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1,31.9,130 | \n",
"| Km2,33.3,165 | \n",
"| Km3,28.1,130 | \n",
"| Km4,30.8,133 | \n",
"| Km5,27.7,103 | \n",
"| Km6,31.2,154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment.VitesseM.PuissanceEstim\n",
"1 Km1,31.9,130 \n",
"2 Km2,33.3,165 \n",
"3 Km3,28.1,130 \n",
"4 Km4,30.8,133 \n",
"5 Km5,27.7,103 \n",
"6 Km6,31.2,154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"read.table(\"exemple_1.txt\", header=T)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On voit bien que les colonnes n'ont pas été séparées comme il faut. Nous devons spécifier les caractères qui séparent ces variables."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Segment | VitesseM | PuissanceEstim |
\n",
"\n",
"\tKm1 | 31.9 | 130 |
\n",
"\tKm2 | 33.3 | 165 |
\n",
"\tKm3 | 28.1 | 130 |
\n",
"\tKm4 | 30.8 | 133 |
\n",
"\tKm5 | 27.7 | 103 |
\n",
"\tKm6 | 31.2 | 154 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Segment & VitesseM & PuissanceEstim\\\\\n",
"\\hline\n",
"\t Km1 & 31.9 & 130 \\\\\n",
"\t Km2 & 33.3 & 165 \\\\\n",
"\t Km3 & 28.1 & 130 \\\\\n",
"\t Km4 & 30.8 & 133 \\\\\n",
"\t Km5 & 27.7 & 103 \\\\\n",
"\t Km6 & 31.2 & 154 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Segment | VitesseM | PuissanceEstim | \n",
"|---|---|---|---|---|---|\n",
"| Km1 | 31.9 | 130 | \n",
"| Km2 | 33.3 | 165 | \n",
"| Km3 | 28.1 | 130 | \n",
"| Km4 | 30.8 | 133 | \n",
"| Km5 | 27.7 | 103 | \n",
"| Km6 | 31.2 | 154 | \n",
"\n",
"\n"
],
"text/plain": [
" Segment VitesseM PuissanceEstim\n",
"1 Km1 31.9 130 \n",
"2 Km2 33.3 165 \n",
"3 Km3 28.1 130 \n",
"4 Km4 30.8 133 \n",
"5 Km5 27.7 103 \n",
"6 Km6 31.2 154 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"read.table(\"exemple_1.txt\", header=T, sep = \",\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## En utilisant le package `RCurl`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Il est possible d'utiliser la bibliothèque `RCurl` qui offre plus d'options. Dans ce cours, nous nous limitons à l'utilisation de `read.csv`. Pour plus d'informations sur cette bibliothèque, vous pouvez lire plus de détails [la documentation de ce _package._](https://cran.r-project.org/web/packages/RCurl/RCurl.pdf)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Updating HTML index of packages in '.Library'\n",
"Making 'packages.html' ... done\n"
]
}
],
"source": [
"install.packages(\"RCurl\")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading required package: bitops\n"
]
}
],
"source": [
"library(RCurl)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"x <- getURL(\"https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv\")\n",
"y <- read.csv(text = x)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"top100_repository_name | month | monthly_increase | monthly_begin_at | monthly_end_with |
\n",
"\n",
"\tBukkit | 2012-03 | 9 | 431 | 440 |
\n",
"\tBukkit | 2012-04 | 19 | 438 | 457 |
\n",
"\tBukkit | 2012-05 | 19 | 455 | 474 |
\n",
"\tBukkit | 2012-06 | 18 | 475 | 493 |
\n",
"\tBukkit | 2012-07 | 15 | 492 | 507 |
\n",
"\tBukkit | 2012-08 | 50 | 506 | 556 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" top100\\_repository\\_name & month & monthly\\_increase & monthly\\_begin\\_at & monthly\\_end\\_with\\\\\n",
"\\hline\n",
"\t Bukkit & 2012-03 & 9 & 431 & 440 \\\\\n",
"\t Bukkit & 2012-04 & 19 & 438 & 457 \\\\\n",
"\t Bukkit & 2012-05 & 19 & 455 & 474 \\\\\n",
"\t Bukkit & 2012-06 & 18 & 475 & 493 \\\\\n",
"\t Bukkit & 2012-07 & 15 & 492 & 507 \\\\\n",
"\t Bukkit & 2012-08 & 50 & 506 & 556 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"top100_repository_name | month | monthly_increase | monthly_begin_at | monthly_end_with | \n",
"|---|---|---|---|---|---|\n",
"| Bukkit | 2012-03 | 9 | 431 | 440 | \n",
"| Bukkit | 2012-04 | 19 | 438 | 457 | \n",
"| Bukkit | 2012-05 | 19 | 455 | 474 | \n",
"| Bukkit | 2012-06 | 18 | 475 | 493 | \n",
"| Bukkit | 2012-07 | 15 | 492 | 507 | \n",
"| Bukkit | 2012-08 | 50 | 506 | 556 | \n",
"\n",
"\n"
],
"text/plain": [
" top100_repository_name month monthly_increase monthly_begin_at\n",
"1 Bukkit 2012-03 9 431 \n",
"2 Bukkit 2012-04 19 438 \n",
"3 Bukkit 2012-05 19 455 \n",
"4 Bukkit 2012-06 18 475 \n",
"5 Bukkit 2012-07 15 492 \n",
"6 Bukkit 2012-08 50 506 \n",
" monthly_end_with\n",
"1 440 \n",
"2 457 \n",
"3 474 \n",
"4 493 \n",
"5 507 \n",
"6 556 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Traiter les valeurs manquantes\n",
"\n",
"Afin d'illustrer le traitement des valeurs manquantes dans R, importons les données de l'exemple 2_2"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm |
\n",
"\n",
"\t 1.24 | 4:01 | 19.1 | 160 | 134 |
\n",
"\t NA | 9:42 | 30.2 | 133 | 146 |
\n",
"\t 1.02 | 1:57 | 30.8 | 141 | 139 |
\n",
"\t17.61 | 36:11 | 29.2 | NA | 144 |
\n",
"\t 9.27 | 19:10 | 29.0 | 121 | 143 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" km & temps & vitesseMoyenne & puissanceMoyenne & bpm\\\\\n",
"\\hline\n",
"\t 1.24 & 4:01 & 19.1 & 160 & 134 \\\\\n",
"\t NA & 9:42 & 30.2 & 133 & 146 \\\\\n",
"\t 1.02 & 1:57 & 30.8 & 141 & 139 \\\\\n",
"\t 17.61 & 36:11 & 29.2 & NA & 144 \\\\\n",
"\t 9.27 & 19:10 & 29.0 & 121 & 143 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm | \n",
"|---|---|---|---|---|\n",
"| 1.24 | 4:01 | 19.1 | 160 | 134 | \n",
"| NA | 9:42 | 30.2 | 133 | 146 | \n",
"| 1.02 | 1:57 | 30.8 | 141 | 139 | \n",
"| 17.61 | 36:11 | 29.2 | NA | 144 | \n",
"| 9.27 | 19:10 | 29.0 | 121 | 143 | \n",
"\n",
"\n"
],
"text/plain": [
" km temps vitesseMoyenne puissanceMoyenne bpm\n",
"1 1.24 4:01 19.1 160 134\n",
"2 NA 9:42 30.2 133 146\n",
"3 1.02 1:57 30.8 141 139\n",
"4 17.61 36:11 29.2 NA 144\n",
"5 9.27 19:10 29.0 121 143"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"read.table(\"https://raw.githubusercontent.com/nmeraihi/data/master/exemple_2_2.txt\", header=T, sep = \",\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut appliquer un test booléen afin de vérifier l'existance des valeurs manquantes comme suit;"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm |
\n",
"\n",
"\tFALSE | FALSE | FALSE | FALSE | FALSE |
\n",
"\t TRUE | FALSE | FALSE | FALSE | FALSE |
\n",
"\tFALSE | FALSE | FALSE | FALSE | FALSE |
\n",
"\tFALSE | FALSE | FALSE | TRUE | FALSE |
\n",
"\tFALSE | FALSE | FALSE | FALSE | FALSE |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{lllll}\n",
" km & temps & vitesseMoyenne & puissanceMoyenne & bpm\\\\\n",
"\\hline\n",
"\t FALSE & FALSE & FALSE & FALSE & FALSE\\\\\n",
"\t TRUE & FALSE & FALSE & FALSE & FALSE\\\\\n",
"\t FALSE & FALSE & FALSE & FALSE & FALSE\\\\\n",
"\t FALSE & FALSE & FALSE & TRUE & FALSE\\\\\n",
"\t FALSE & FALSE & FALSE & FALSE & FALSE\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm | \n",
"|---|---|---|---|---|\n",
"| FALSE | FALSE | FALSE | FALSE | FALSE | \n",
"| TRUE | FALSE | FALSE | FALSE | FALSE | \n",
"| FALSE | FALSE | FALSE | FALSE | FALSE | \n",
"| FALSE | FALSE | FALSE | TRUE | FALSE | \n",
"| FALSE | FALSE | FALSE | FALSE | FALSE | \n",
"\n",
"\n"
],
"text/plain": [
" km temps vitesseMoyenne puissanceMoyenne bpm \n",
"[1,] FALSE FALSE FALSE FALSE FALSE\n",
"[2,] TRUE FALSE FALSE FALSE FALSE\n",
"[3,] FALSE FALSE FALSE FALSE FALSE\n",
"[4,] FALSE FALSE FALSE TRUE FALSE\n",
"[5,] FALSE FALSE FALSE FALSE FALSE"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df<-read.table(\"https://raw.githubusercontent.com/nmeraihi/data/master/exemple_2_2.txt\", header=T, sep = \",\")\n",
"is.na(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cette fonction nous retourne un _data frame_ du même format que le _df_ test. Le résultat obtrenu sont valeurs `TRUE` sur les éléments manquants, et des `FALSE` sur les valeurs existantes.\n",
"\n",
"On peut faire le test sur une partie précise du _df_;"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"FALSE"
],
"text/latex": [
"FALSE"
],
"text/markdown": [
"FALSE"
],
"text/plain": [
"[1] FALSE"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"is.na(df[1,1])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"TRUE"
],
"text/latex": [
"TRUE"
],
"text/markdown": [
"TRUE"
],
"text/plain": [
"[1] TRUE"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"is.na(df[2,1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mais pourquoi préocupe t-on tant des valeurs manquantes? Eh bien, les valeurs manquantes sont le cauchemar #1 de toute personne qui manipule les données, que ce soit en entreprise ou pour un utilisation personnelle. \n",
"\n",
"Essayons de faire un calcul de la moyenne du nombre de km;"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1] NA"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mean(df$km)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**`R`** nous retourne `NA` même si nous avons une seule observation qui est manquante\n",
"\n",
"On peut régler ce problème avec la fonction `na.omit()`\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"7.285"
],
"text/latex": [
"7.285"
],
"text/markdown": [
"7.285"
],
"text/plain": [
"[1] 7.285"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mean(na.omit(df$km))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le calcul de la moyenne a été fait sur les variables;"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 1.24
\n",
"\t- 1.02
\n",
"\t- 17.61
\n",
"\t- 9.27
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1.24\n",
"\\item 1.02\n",
"\\item 17.61\n",
"\\item 9.27\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1.24\n",
"2. 1.02\n",
"3. 17.61\n",
"4. 9.27\n",
"\n",
"\n"
],
"text/plain": [
"[1] 1.24 1.02 17.61 9.27\n",
"attr(,\"na.action\")\n",
"[1] 2\n",
"attr(,\"class\")\n",
"[1] \"omit\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"na.omit(df$km)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La fonction `mean` possède un argument optionnel appelé `na.rm = ` qui ignore les valeurs manquantes;"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"7.285"
],
"text/latex": [
"7.285"
],
"text/markdown": [
"7.285"
],
"text/plain": [
"[1] 7.285"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mean(df$km,na.rm = T)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lorsque nous utilisons la fonction `ns.omit`, le df se réduit à un df qui ne possède aucune ligne contenant les valeurs manquantes;"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" | km | temps | vitesseMoyenne | puissanceMoyenne | bpm |
\n",
"\n",
"\t1 | 1.24 | 4:01 | 19.1 | 160 | 134 |
\n",
"\t3 | 1.02 | 1:57 | 30.8 | 141 | 139 |
\n",
"\t5 | 9.27 | 19:10 | 29.0 | 121 | 143 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" & km & temps & vitesseMoyenne & puissanceMoyenne & bpm\\\\\n",
"\\hline\n",
"\t1 & 1.24 & 4:01 & 19.1 & 160 & 134 \\\\\n",
"\t3 & 1.02 & 1:57 & 30.8 & 141 & 139 \\\\\n",
"\t5 & 9.27 & 19:10 & 29.0 & 121 & 143 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | km | temps | vitesseMoyenne | puissanceMoyenne | bpm | \n",
"|---|---|---|\n",
"| 1 | 1.24 | 4:01 | 19.1 | 160 | 134 | \n",
"| 3 | 1.02 | 1:57 | 30.8 | 141 | 139 | \n",
"| 5 | 9.27 | 19:10 | 29.0 | 121 | 143 | \n",
"\n",
"\n"
],
"text/plain": [
" km temps vitesseMoyenne puissanceMoyenne bpm\n",
"1 1.24 4:01 19.1 160 134\n",
"3 1.02 1:57 30.8 141 139\n",
"5 9.27 19:10 29.0 121 143"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"na.omit(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut aller modifier directement la valeur de cet élément"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
" df[2,1]<-4.84\n",
" df[4,4]<-125"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm |
\n",
"\n",
"\t 1.24 | 4:01 | 19.1 | 160 | 134 |
\n",
"\t 4.84 | 9:42 | 30.2 | 133 | 146 |
\n",
"\t 1.02 | 1:57 | 30.8 | 141 | 139 |
\n",
"\t17.61 | 36:11 | 29.2 | 125 | 144 |
\n",
"\t 9.27 | 19:10 | 29.0 | 121 | 143 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" km & temps & vitesseMoyenne & puissanceMoyenne & bpm\\\\\n",
"\\hline\n",
"\t 1.24 & 4:01 & 19.1 & 160 & 134 \\\\\n",
"\t 4.84 & 9:42 & 30.2 & 133 & 146 \\\\\n",
"\t 1.02 & 1:57 & 30.8 & 141 & 139 \\\\\n",
"\t 17.61 & 36:11 & 29.2 & 125 & 144 \\\\\n",
"\t 9.27 & 19:10 & 29.0 & 121 & 143 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm | \n",
"|---|---|---|---|---|\n",
"| 1.24 | 4:01 | 19.1 | 160 | 134 | \n",
"| 4.84 | 9:42 | 30.2 | 133 | 146 | \n",
"| 1.02 | 1:57 | 30.8 | 141 | 139 | \n",
"| 17.61 | 36:11 | 29.2 | 125 | 144 | \n",
"| 9.27 | 19:10 | 29.0 | 121 | 143 | \n",
"\n",
"\n"
],
"text/plain": [
" km temps vitesseMoyenne puissanceMoyenne bpm\n",
"1 1.24 4:01 19.1 160 134\n",
"2 4.84 9:42 30.2 133 146\n",
"3 1.02 1:57 30.8 141 139\n",
"4 17.61 36:11 29.2 125 144\n",
"5 9.27 19:10 29.0 121 143"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remplacer toutes les valeurs manquantes:\n",
"Des fois, il peut être utile de remplacer toutes les valeurs manquantes par des 0. Je dis bien des fois, car les valeurs manquantes sont absentes et non des 0."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm |
\n",
"\n",
"\t 1.24 | 4:01 | 19.1 | 160 | 134 |
\n",
"\t NA | 9:42 | 30.2 | 133 | 146 |
\n",
"\t 1.02 | 1:57 | 30.8 | 141 | 139 |
\n",
"\t17.61 | 36:11 | 29.2 | NA | 144 |
\n",
"\t 9.27 | 19:10 | 29.0 | 121 | 143 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" km & temps & vitesseMoyenne & puissanceMoyenne & bpm\\\\\n",
"\\hline\n",
"\t 1.24 & 4:01 & 19.1 & 160 & 134 \\\\\n",
"\t NA & 9:42 & 30.2 & 133 & 146 \\\\\n",
"\t 1.02 & 1:57 & 30.8 & 141 & 139 \\\\\n",
"\t 17.61 & 36:11 & 29.2 & NA & 144 \\\\\n",
"\t 9.27 & 19:10 & 29.0 & 121 & 143 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm | \n",
"|---|---|---|---|---|\n",
"| 1.24 | 4:01 | 19.1 | 160 | 134 | \n",
"| NA | 9:42 | 30.2 | 133 | 146 | \n",
"| 1.02 | 1:57 | 30.8 | 141 | 139 | \n",
"| 17.61 | 36:11 | 29.2 | NA | 144 | \n",
"| 9.27 | 19:10 | 29.0 | 121 | 143 | \n",
"\n",
"\n"
],
"text/plain": [
" km temps vitesseMoyenne puissanceMoyenne bpm\n",
"1 1.24 4:01 19.1 160 134\n",
"2 NA 9:42 30.2 133 146\n",
"3 1.02 1:57 30.8 141 139\n",
"4 17.61 36:11 29.2 NA 144\n",
"5 9.27 19:10 29.0 121 143"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df<-read.table(\"https://raw.githubusercontent.com/nmeraihi/data/master/exemple_2_2.txt\", header=T, sep = \",\")\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous remplaçons alors les `NA` par `0` ou par toute autre valeur comme suit;"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"df[is.na(df)] <- 0"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm |
\n",
"\n",
"\t 1.24 | 4:01 | 19.1 | 160 | 134 |
\n",
"\t 0.00 | 9:42 | 30.2 | 133 | 146 |
\n",
"\t 1.02 | 1:57 | 30.8 | 141 | 139 |
\n",
"\t17.61 | 36:11 | 29.2 | 0 | 144 |
\n",
"\t 9.27 | 19:10 | 29.0 | 121 | 143 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" km & temps & vitesseMoyenne & puissanceMoyenne & bpm\\\\\n",
"\\hline\n",
"\t 1.24 & 4:01 & 19.1 & 160 & 134 \\\\\n",
"\t 0.00 & 9:42 & 30.2 & 133 & 146 \\\\\n",
"\t 1.02 & 1:57 & 30.8 & 141 & 139 \\\\\n",
"\t 17.61 & 36:11 & 29.2 & 0 & 144 \\\\\n",
"\t 9.27 & 19:10 & 29.0 & 121 & 143 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"km | temps | vitesseMoyenne | puissanceMoyenne | bpm | \n",
"|---|---|---|---|---|\n",
"| 1.24 | 4:01 | 19.1 | 160 | 134 | \n",
"| 0.00 | 9:42 | 30.2 | 133 | 146 | \n",
"| 1.02 | 1:57 | 30.8 | 141 | 139 | \n",
"| 17.61 | 36:11 | 29.2 | 0 | 144 | \n",
"| 9.27 | 19:10 | 29.0 | 121 | 143 | \n",
"\n",
"\n"
],
"text/plain": [
" km temps vitesseMoyenne puissanceMoyenne bpm\n",
"1 1.24 4:01 19.1 160 134\n",
"2 0.00 9:42 30.2 133 146\n",
"3 1.02 1:57 30.8 141 139\n",
"4 17.61 36:11 29.2 0 144\n",
"5 9.27 19:10 29.0 121 143"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.1.2"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"toc": {
"base_numbering": 1,
"nav_menu": {
"height": "444px",
"width": "252px"
},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {
"height": "1089px",
"left": "0px",
"right": "20px",
"top": "159px",
"width": "212px"
},
"toc_section_display": false,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 4
}