{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data et Set"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Nous avons vu dans le chapitre précédent qu'il est possible d'**afficher** uniquement les variables qui nous intéressent. Nous avons vu également qu'il est possible d'**afficher** que les observations qui nous intéressent. Par exemple dans un ensemble de données sur des clients, nous pouvons **afficher** seulement les noms, prénoms et âge des clients habitant à Montréal."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Toutefois, avec la procédure `Format` nous modifions seulement l'**affichage**. Si nous voulons effectuer une opération quelconque (calcul, régression...etc.) sur les variables et les observations d'intérêt, il faudrait extraire (ou dupliquer) un sous-ensemble de données."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"\n",
"```\n",
"DATA sous-ensemble-données;\n",
" SET ensemble-données;\n",
" WHERE variable_1='ce_qui_nous_interesse_1'\n",
" and variable_2='ce_qui_nous_interesse_2'\n",
"RUN;\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Le sous-ensemble de données peut-être temporaire avec la bibliothèque `WORK.sous_sensemble` ou dans une bibliothèque permanente `LIBNAME.sous_sensemble`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"On peut extraire les observations d'intérêt avec la déclaration `WHERE`. Nous pouvons spécifier ce qui nous intéresse avec la description **exacte** si nous la connaissons, par exemple `WHRE type_prof='Ingénieur`. Lorsque nous ne sommes pas sûrs de l’écriture exacte de ce que nous cherchons, ou par exemple nous voulons extraire tout ce qui contient les deux lettres `IN`, nous utilisons les `%...%`. Par exemple `WHRE type_prof='%In%`. Dans ce dernier cas, nous aurons tous les types de professions contenant les deux lettres `In`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Exemple;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Reprenons notre base de données d'assurance;"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SAS Connection established. Subprocess id is 23973\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"cout1 | \n",
"cout2 | \n",
"
\n",
"\n",
"\n",
"\n",
"52484 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"27817 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"32986 | \n",
"Technicien | \n",
"Carnivore | \n",
"230.280 | \n",
"287.217 | \n",
"
\n",
"\n",
"45769 | \n",
"Technicien | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"libname assu 'data';\n",
"Data assu.cours_2;\n",
" set assu.eq_2;\n",
"/* nous verrons un peu plus loin à quoi sert keep */\n",
" keep numeropol type_prof alimentation cout1 cout2; \n",
"run;\n",
"proc print data=assu.cours_2 (firstobs = 2 obs = 5) noobs;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"On se rappelle des types de professions que nous avions dans notre base de données originale."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"type_prof |
\n",
" Actuaire |
\n",
" Autre |
\n",
" Avocat |
\n",
" Hockeyeur |
\n",
" Infirmière |
\n",
" Informaticien |
\n",
" Ingénieur |
\n",
" Médecin |
\n",
" Professeur |
\n",
" Technicien |
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Nous voudrions créer un sous-ensemble de données **temporaire** appelé `bd_ingenieurs` sur lequel on veut travailler. Ce dernier provient de l'ensemble de données `assu.eq_2` et contient seulement les données sur les ingénieurs."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"cout1 | \n",
"cout2 | \n",
"
\n",
"\n",
"\n",
"\n",
"2 | \n",
"27817 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"19651 | \n",
"Ingénieur | \n",
"Végétalien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"12507 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"8670 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"6 | \n",
"17867 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"7 | \n",
"5127 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"8 | \n",
"53543 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"9 | \n",
"27131 | \n",
"Ingénieur | \n",
"Carnivore | \n",
"289.311 | \n",
". | \n",
"
\n",
"\n",
"10 | \n",
"16186 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_ingenieurs;\n",
" set assu.cours_2;\n",
" where type_prof='Ingénieur';\n",
"run;\n",
"proc print data=work.bd_ingenieurs (firstobs = 2 obs = 10);\n",
"run;\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Maintenant, on veut les types de professions qui ont les deux **In** comme **Infirmière** ou **Ingénieur**"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"cout1 | \n",
"cout2 | \n",
"
\n",
"\n",
"\n",
"\n",
"2 | \n",
"27817 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"13590 | \n",
"Infirmière | \n",
"Carnivore | \n",
"509.261 | \n",
"2665.25 | \n",
"
\n",
"\n",
"4 | \n",
"76422 | \n",
"Infirmière | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"20860 | \n",
"Infirmière | \n",
"Végétarien | \n",
"487.640 | \n",
". | \n",
"
\n",
"\n",
"6 | \n",
"49010 | \n",
"Infirmière | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"7 | \n",
"30089 | \n",
"Informaticien | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"8 | \n",
"17174 | \n",
"Informaticien | \n",
"Végétarien | \n",
"458.789 | \n",
". | \n",
"
\n",
"\n",
"9 | \n",
"38332 | \n",
"Infirmière | \n",
"Végétalien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"10 | \n",
"19651 | \n",
"Ingénieur | \n",
"Végétalien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_in;\n",
" set assu.cours_2;\n",
" where type_prof like '%In%';\n",
"run;\n",
"proc print data=work.bd_in (firstobs = 2 obs = 10);\n",
"run;\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"On peut vérifier que nous avons les types de proffessions avec `In`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"type_prof |
\n",
" Infirmière |
\n",
" Informaticien |
\n",
" Ingénieur |
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Regardons maintenant les types de proffesions se terminant par `eur`"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"cout1 | \n",
"cout2 | \n",
"
\n",
"\n",
"\n",
"\n",
"2 | \n",
"27817 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"56107 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"19651 | \n",
"Ingénieur | \n",
"Végétalien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"44235 | \n",
"Professeur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"6 | \n",
"24013 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"7 | \n",
"12507 | \n",
"Ingénieur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"8 | \n",
"28741 | \n",
"Professeur | \n",
"Carnivore | \n",
". | \n",
". | \n",
"
\n",
"\n",
"9 | \n",
"14731 | \n",
"Hockeyeur | \n",
"Carnivore | \n",
"67.901 | \n",
"1639.48 | \n",
"
\n",
"\n",
"10 | \n",
"3977 | \n",
"Hockeyeur | \n",
"Carnivore | \n",
"263.075 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_eur;\n",
" set assu.cours_2;\n",
" where type_prof like '%eur';\n",
"run;\n",
"proc print data=work.bd_eur (firstobs = 2 obs = 10);\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Nous obtenons alors que: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"type_prof |
\n",
" Hockeyeur |
\n",
" Ingénieur |
\n",
" Professeur |
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Si vous êtes curieux de voir comment nous obtenons ce résultat, voici la commande `PROC sql` que nous verrons plus en détail dans les prochains cours"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Hockeyeur | \n",
"
\n",
"\n",
"Ingénieur | \n",
"
\n",
"\n",
"Professeur | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"proc sql;\n",
" create table eur as\n",
" select type_prof, count(*) as type_prof from work.bd_eur \n",
" group by type_prof;\n",
"quit;\n",
"proc print data=eur noobs;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Maintenant, essayons d'extraire un sous-ensemble de données avec les critères suivants;\n",
"1. Hokeyeur\n",
"2. Qui est végétarien ou végétalien"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"cout1 | \n",
"cout2 | \n",
"
\n",
"\n",
"\n",
"\n",
"2 | \n",
"24013 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"24660 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"56436 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"18535 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"6 | \n",
"21672 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"7 | \n",
"12342 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"8 | \n",
"74410 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"9 | \n",
"15159 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"456.261 | \n",
". | \n",
"
\n",
"\n",
"10 | \n",
"32260 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%';\n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10);\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Création d'une nouvelle variable:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Lorsque nous produisons un sous-ensemble de données, nous pouvons aussi créer une toute nouvelle variable à partir d'une opération quelconque sur les variables existantes. Par exemple, calculer le total des coûts de sinistre."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Comme dans l'exemple précédent, pour les Hokeyeurs qui ne mangent pas de viande, calculons le total des coûts de sinistres cout1-cout7;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Remarquez** dans le code ci-dessous que nous avons un montant à la variable `cout1`. Toutefois, le total n'est pas affiché à la neuvième observation. En effet, quand on somme avec des `+`, SAS considère les valeurs manquantes dans le calcul."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"cout1 | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"24013 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"24660 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"56436 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"18535 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"21672 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"12342 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"74410 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"15159 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"456.261 | \n",
". | \n",
"
\n",
"\n",
"32260 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" cout_total=cout1+cout2+cout3+cout4+cout5+cout6+cout7;\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%'; \n",
" keep numeropol\ttype_prof\talimentation cout1 cout_total; \n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n",
"format cout1-cout7 cout_total dollar12.2;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"La fonction `sum` règle ce problème;"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"24013 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"24660 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"56436 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"18535 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
". | \n",
"
\n",
"\n",
"21672 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"12342 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"74410 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"15159 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$456.26 | \n",
"
\n",
"\n",
"32260 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" cout_total=sum(cout1,cout2, cout3, cout4, cout5, cout6, cout7);\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%';\n",
" keep numeropol\ttype_prof\talimentation cout_total; \n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n",
"format cout1-cout7 cout_total dollar12.2;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Maintenant que nous avons vu comment extraire les données, nous voudrions garder dans notre nouvelle base de données seulement les variables d’intérêt. Autrement dit, nous voulons **garder** ou **ignorer** certaines variables. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"SAS possède l'option **Keep** ou **Drop** à cet effet.\n",
"\n",
"```\n",
"DATA sous-ensemble-données;\n",
" SET ensemble-données;\n",
" WHERE variable_1='ce_qui_nous_interesse_1'\n",
" and variable_2='ce_qui_nous_interesse_2'\n",
" drop variable_1 variable_6 variable_7-variable_99\n",
"RUN;\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"ou"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"```\n",
"DATA sous-ensemble-données;\n",
" SET ensemble-données;\n",
" WHERE variable_1='ce_qui_nous_interesse_1'\n",
" and variable_2='ce_qui_nous_interesse_2'\n",
" keep variable_2 variable_3-variable_5 variable_7-variable_99\n",
"RUN;\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Comment choisit-on entre les deux options?\n",
"\n",
"Tout dépend bien sûr des variables dont nous avons besoin et du nombre de variables que nous avons à ignorer."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exemple avec `keep`"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"2 | \n",
"24013 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"24660 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"56436 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"18535 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
". | \n",
"
\n",
"\n",
"6 | \n",
"21672 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"7 | \n",
"12342 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"8 | \n",
"74410 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"9 | \n",
"15159 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$456.26 | \n",
"
\n",
"\n",
"10 | \n",
"32260 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" cout_total=sum(cout1,cout2);\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%';\n",
" keep numeropol type_prof alimentation cout_total;\n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10);\n",
"format cout1-cout2 cout_total dollar12.2;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Exemple avec `drop`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Si nos variables d'intérêt sont (numeropol, type_prof, alimentation, cout_total), on voit que l'option la plus courte serait d'utiliser **`keep`**"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"24013 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"24660 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"56436 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"18535 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
". | \n",
"
\n",
"\n",
"21672 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"12342 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"74410 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"15159 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$456.26 | \n",
"
\n",
"\n",
"32260 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" cout_total=sum(cout1,cout2);\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%';\n",
" drop debut_pol\tfin_pol\tfreq_paiement\tlangue\ttype_territoire\tutilisation\t\n",
" presence_alarme\tannee_veh\tmarque_voiture\tsexe\tdate_naissance\t\n",
" couleur_voiture\tannee_permis\tmyopie\tcheveux\tcout1-cout7\tnbsin\tequipe\t;\n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n",
"format cout1-cout2 cout_total dollar12.2;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Extraire un sous-ensemble avec de `IF`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Imaginons que nous voulons maintenant extraire un sous-ensemble de données avec les caractéristiques suivantes (Hokeyeur, végétarien ou végétalien, et que les coûts de sinistre est supérieur à 1000$)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Bien évidemment, on ne peut pas faire un `WHERE` car la variable cout_total n'existe pas dans l'ensemble de données originales. Cette dernière a été créée seulement dans le sous-ensemble de données."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"La fonction `IF` sert justement à résoudre ce genre de situation;"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"84568 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
"$1,000.81 | \n",
"
\n",
"\n",
"51085 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$70,040.85 | \n",
"
\n",
"\n",
"12443 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$1,536.96 | \n",
"
\n",
"\n",
"60675 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$3,105.94 | \n",
"
\n",
"\n",
"1095 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$22,753.50 | \n",
"
\n",
"\n",
"312 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$1,157.76 | \n",
"
\n",
"\n",
"312 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$38,509.17 | \n",
"
\n",
"\n",
"18535 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$3,857.07 | \n",
"
\n",
"\n",
"44485 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$1,106.05 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" cout_total=sum(cout1,cout2);\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%';\n",
" keep numeropol type_prof alimentation cout_total;\n",
" if cout_total>1000;\n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n",
"format cout1-cout2 cout_total dollar12.2;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Maintenant, supposons que nous voulons afficher `coût total en CAD` au lieu de `cout_total`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"On se rappelle que nous avons l'option `Label` à cet effet."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"coût total en CAD | \n",
"
\n",
"\n",
"\n",
"\n",
"84568 | \n",
"Hockeyeur | \n",
"Végétalien | \n",
"$1,000.81 | \n",
"
\n",
"\n",
"51085 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$70,040.85 | \n",
"
\n",
"\n",
"12443 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$1,536.96 | \n",
"
\n",
"\n",
"60675 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$3,105.94 | \n",
"
\n",
"\n",
"1095 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$22,753.50 | \n",
"
\n",
"\n",
"312 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$1,157.76 | \n",
"
\n",
"\n",
"312 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$38,509.17 | \n",
"
\n",
"\n",
"18535 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$3,857.07 | \n",
"
\n",
"\n",
"44485 | \n",
"Hockeyeur | \n",
"Végétarien | \n",
"$1,106.05 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Data work.bd_hk_veg;\n",
" set assu.cours_2;\n",
" cout_total=sum(cout1,cout2);\n",
" where type_prof='Hockeyeur'\n",
" and alimentation like 'Vé%';\n",
" keep numeropol type_prof alimentation cout_total;\n",
" if cout_total>1000;\n",
" Label cout_total='coût total en CAD';\n",
"run;\n",
"proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) label noobs;\n",
"format cout1-cout2 cout_total dollar12.2;\n",
"run;"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 4
}