{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data et Set" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Nous avons vu dans le chapitre précédent qu'il est possible d'**afficher** uniquement les variables qui nous intéressent. Nous avons vu également qu'il est possible d'**afficher** que les observations qui nous intéressent. Par exemple dans un ensemble de données sur des clients, nous pouvons **afficher** seulement les noms, prénoms et âge des clients habitant à Montréal." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Toutefois, avec la procédure `Format` nous modifions seulement l'**affichage**. Si nous voulons effectuer une opération quelconque (calcul, régression...etc.) sur les variables et les observations d'intérêt, il faudrait extraire (ou dupliquer) un sous-ensemble de données." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "```\n", "DATA sous-ensemble-données;\n", " SET ensemble-données;\n", " WHERE variable_1='ce_qui_nous_interesse_1'\n", " and variable_2='ce_qui_nous_interesse_2'\n", "RUN;\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Le sous-ensemble de données peut-être temporaire avec la bibliothèque `WORK.sous_sensemble` ou dans une bibliothèque permanente `LIBNAME.sous_sensemble`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On peut extraire les observations d'intérêt avec la déclaration `WHERE`. Nous pouvons spécifier ce qui nous intéresse avec la description **exacte** si nous la connaissons, par exemple `WHRE type_prof='Ingénieur`. Lorsque nous ne sommes pas sûrs de l’écriture exacte de ce que nous cherchons, ou par exemple nous voulons extraire tout ce qui contient les deux lettres `IN`, nous utilisons les `%...%`. Par exemple `WHRE type_prof='%In%`. Dans ce dernier cas, nous aurons tous les types de professions contenant les deux lettres `In`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exemple;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Reprenons notre base de données d'assurance;" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SAS Connection established. Subprocess id is 23973\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoltype_profalimentationcout1cout2
52484IngénieurCarnivore..
27817IngénieurCarnivore..
32986TechnicienCarnivore230.280287.217
45769TechnicienCarnivore..
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "libname assu 'data';\n", "Data assu.cours_2;\n", " set assu.eq_2;\n", "/* nous verrons un peu plus loin à quoi sert keep */\n", " keep numeropol type_prof alimentation cout1 cout2; \n", "run;\n", "proc print data=assu.cours_2 (firstobs = 2 obs = 5) noobs;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On se rappelle des types de professions que nous avions dans notre base de données originale." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
type_prof
Actuaire
Autre
Avocat
Hockeyeur
Infirmière
Informaticien
Ingénieur
Médecin
Professeur
Technicien
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Nous voudrions créer un sous-ensemble de données **temporaire** appelé `bd_ingenieurs` sur lequel on veut travailler. Ce dernier provient de l'ensemble de données `assu.eq_2` et contient seulement les données sur les ingénieurs." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnumeropoltype_profalimentationcout1cout2
227817IngénieurCarnivore..
319651IngénieurVégétalien..
412507IngénieurCarnivore..
58670IngénieurCarnivore..
617867IngénieurCarnivore..
75127IngénieurCarnivore..
853543IngénieurCarnivore..
927131IngénieurCarnivore289.311.
1016186IngénieurCarnivore..
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_ingenieurs;\n", " set assu.cours_2;\n", " where type_prof='Ingénieur';\n", "run;\n", "proc print data=work.bd_ingenieurs (firstobs = 2 obs = 10);\n", "run;\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Maintenant, on veut les types de professions qui ont les deux **In** comme **Infirmière** ou **Ingénieur**" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnumeropoltype_profalimentationcout1cout2
227817IngénieurCarnivore..
313590InfirmièreCarnivore509.2612665.25
476422InfirmièreCarnivore..
520860InfirmièreVégétarien487.640.
649010InfirmièreVégétarien..
730089InformaticienCarnivore..
817174InformaticienVégétarien458.789.
938332InfirmièreVégétalien..
1019651IngénieurVégétalien..
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_in;\n", " set assu.cours_2;\n", " where type_prof like '%In%';\n", "run;\n", "proc print data=work.bd_in (firstobs = 2 obs = 10);\n", "run;\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On peut vérifier que nous avons les types de proffessions avec `In`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", " \n", " \n", " \n", "
type_prof
Infirmière
Informaticien
Ingénieur
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Regardons maintenant les types de proffesions se terminant par `eur`" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnumeropoltype_profalimentationcout1cout2
227817IngénieurCarnivore..
356107HockeyeurVégétarien..
419651IngénieurVégétalien..
544235ProfesseurCarnivore..
624013HockeyeurVégétarien..
712507IngénieurCarnivore..
828741ProfesseurCarnivore..
914731HockeyeurCarnivore67.9011639.48
103977HockeyeurCarnivore263.075.
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_eur;\n", " set assu.cours_2;\n", " where type_prof like '%eur';\n", "run;\n", "proc print data=work.bd_eur (firstobs = 2 obs = 10);\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Nous obtenons alors que: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", " \n", " \n", " \n", "
type_prof
Hockeyeur
Ingénieur
Professeur
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Si vous êtes curieux de voir comment nous obtenons ce résultat, voici la commande `PROC sql` que nous verrons plus en détail dans les prochains cours" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
type_prof
Hockeyeur
Ingénieur
Professeur
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " create table eur as\n", " select type_prof, count(*) as type_prof from work.bd_eur \n", " group by type_prof;\n", "quit;\n", "proc print data=eur noobs;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Maintenant, essayons d'extraire un sous-ensemble de données avec les critères suivants;\n", "1. Hokeyeur\n", "2. Qui est végétarien ou végétalien" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnumeropoltype_profalimentationcout1cout2
224013HockeyeurVégétarien..
324660HockeyeurVégétarien..
456436HockeyeurVégétarien..
518535HockeyeurVégétalien..
621672HockeyeurVégétarien..
712342HockeyeurVégétarien..
874410HockeyeurVégétarien..
915159HockeyeurVégétarien456.261.
1032260HockeyeurVégétarien..
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%';\n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10);\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Création d'une nouvelle variable:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Lorsque nous produisons un sous-ensemble de données, nous pouvons aussi créer une toute nouvelle variable à partir d'une opération quelconque sur les variables existantes. Par exemple, calculer le total des coûts de sinistre." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Comme dans l'exemple précédent, pour les Hokeyeurs qui ne mangent pas de viande, calculons le total des coûts de sinistres cout1-cout7;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Remarquez** dans le code ci-dessous que nous avons un montant à la variable `cout1`. Toutefois, le total n'est pas affiché à la neuvième observation. En effet, quand on somme avec des `+`, SAS considère les valeurs manquantes dans le calcul." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoltype_profalimentationcout1cout_total
24013HockeyeurVégétarien..
24660HockeyeurVégétarien..
56436HockeyeurVégétarien..
18535HockeyeurVégétalien..
21672HockeyeurVégétarien..
12342HockeyeurVégétarien..
74410HockeyeurVégétarien..
15159HockeyeurVégétarien456.261.
32260HockeyeurVégétarien..
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " cout_total=cout1+cout2+cout3+cout4+cout5+cout6+cout7;\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%'; \n", " keep numeropol\ttype_prof\talimentation cout1 cout_total; \n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n", "format cout1-cout7 cout_total dollar12.2;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "La fonction `sum` règle ce problème;" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoltype_profalimentationcout_total
24013HockeyeurVégétarien.
24660HockeyeurVégétarien.
56436HockeyeurVégétarien.
18535HockeyeurVégétalien.
21672HockeyeurVégétarien.
12342HockeyeurVégétarien.
74410HockeyeurVégétarien.
15159HockeyeurVégétarien$456.26
32260HockeyeurVégétarien.
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " cout_total=sum(cout1,cout2, cout3, cout4, cout5, cout6, cout7);\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%';\n", " keep numeropol\ttype_prof\talimentation cout_total; \n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n", "format cout1-cout7 cout_total dollar12.2;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Maintenant que nous avons vu comment extraire les données, nous voudrions garder dans notre nouvelle base de données seulement les variables d’intérêt. Autrement dit, nous voulons **garder** ou **ignorer** certaines variables. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "SAS possède l'option **Keep** ou **Drop** à cet effet.\n", "\n", "```\n", "DATA sous-ensemble-données;\n", " SET ensemble-données;\n", " WHERE variable_1='ce_qui_nous_interesse_1'\n", " and variable_2='ce_qui_nous_interesse_2'\n", " drop variable_1 variable_6 variable_7-variable_99\n", "RUN;\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "ou" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```\n", "DATA sous-ensemble-données;\n", " SET ensemble-données;\n", " WHERE variable_1='ce_qui_nous_interesse_1'\n", " and variable_2='ce_qui_nous_interesse_2'\n", " keep variable_2 variable_3-variable_5 variable_7-variable_99\n", "RUN;\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Comment choisit-on entre les deux options?\n", "\n", "Tout dépend bien sûr des variables dont nous avons besoin et du nombre de variables que nous avons à ignorer." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exemple avec `keep`" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnumeropoltype_profalimentationcout_total
224013HockeyeurVégétarien.
324660HockeyeurVégétarien.
456436HockeyeurVégétarien.
518535HockeyeurVégétalien.
621672HockeyeurVégétarien.
712342HockeyeurVégétarien.
874410HockeyeurVégétarien.
915159HockeyeurVégétarien$456.26
1032260HockeyeurVégétarien.
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " cout_total=sum(cout1,cout2);\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%';\n", " keep numeropol type_prof alimentation cout_total;\n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10);\n", "format cout1-cout2 cout_total dollar12.2;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exemple avec `drop`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Si nos variables d'intérêt sont (numeropol, type_prof, alimentation, cout_total), on voit que l'option la plus courte serait d'utiliser **`keep`**" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoltype_profalimentationcout_total
24013HockeyeurVégétarien.
24660HockeyeurVégétarien.
56436HockeyeurVégétarien.
18535HockeyeurVégétalien.
21672HockeyeurVégétarien.
12342HockeyeurVégétarien.
74410HockeyeurVégétarien.
15159HockeyeurVégétarien$456.26
32260HockeyeurVégétarien.
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " cout_total=sum(cout1,cout2);\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%';\n", " drop debut_pol\tfin_pol\tfreq_paiement\tlangue\ttype_territoire\tutilisation\t\n", " presence_alarme\tannee_veh\tmarque_voiture\tsexe\tdate_naissance\t\n", " couleur_voiture\tannee_permis\tmyopie\tcheveux\tcout1-cout7\tnbsin\tequipe\t;\n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n", "format cout1-cout2 cout_total dollar12.2;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Extraire un sous-ensemble avec de `IF`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Imaginons que nous voulons maintenant extraire un sous-ensemble de données avec les caractéristiques suivantes (Hokeyeur, végétarien ou végétalien, et que les coûts de sinistre est supérieur à 1000$)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Bien évidemment, on ne peut pas faire un `WHERE` car la variable cout_total n'existe pas dans l'ensemble de données originales. Cette dernière a été créée seulement dans le sous-ensemble de données." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "La fonction `IF` sert justement à résoudre ce genre de situation;" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoltype_profalimentationcout_total
84568HockeyeurVégétalien$1,000.81
51085HockeyeurVégétarien$70,040.85
12443HockeyeurVégétarien$1,536.96
60675HockeyeurVégétarien$3,105.94
1095HockeyeurVégétarien$22,753.50
312HockeyeurVégétarien$1,157.76
312HockeyeurVégétarien$38,509.17
18535HockeyeurVégétarien$3,857.07
44485HockeyeurVégétarien$1,106.05
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " cout_total=sum(cout1,cout2);\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%';\n", " keep numeropol type_prof alimentation cout_total;\n", " if cout_total>1000;\n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) noobs;\n", "format cout1-cout2 cout_total dollar12.2;\n", "run;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Maintenant, supposons que nous voulons afficher `coût total en CAD` au lieu de `cout_total`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On se rappelle que nous avons l'option `Label` à cet effet." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoltype_profalimentationcoût total en CAD
84568HockeyeurVégétalien$1,000.81
51085HockeyeurVégétarien$70,040.85
12443HockeyeurVégétarien$1,536.96
60675HockeyeurVégétarien$3,105.94
1095HockeyeurVégétarien$22,753.50
312HockeyeurVégétarien$1,157.76
312HockeyeurVégétarien$38,509.17
18535HockeyeurVégétarien$3,857.07
44485HockeyeurVégétarien$1,106.05
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Data work.bd_hk_veg;\n", " set assu.cours_2;\n", " cout_total=sum(cout1,cout2);\n", " where type_prof='Hockeyeur'\n", " and alimentation like 'Vé%';\n", " keep numeropol type_prof alimentation cout_total;\n", " if cout_total>1000;\n", " Label cout_total='coût total en CAD';\n", "run;\n", "proc print data=work.bd_hk_veg (firstobs = 2 obs = 10) label noobs;\n", "format cout1-cout2 cout_total dollar12.2;\n", "run;" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }