{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Les jointures de tables partie_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dans cette partie du cours, nous allons appliquer les procédures `proc sql` que nous avons apprises, sur [une base de données relationnelle d’assurance automobile](https://github.com/nmeraihi/data/tree/master/bdSQL). Il faut noter que ces données proviennent d'une vraie bd d'assurance. Toutefois, le contenu a été modifié et plusieurs variables ont été supprimées à des fins de confidentialité. Nous avons gardé l'essentiel des données qui nous servent de matériel pédagogique pour les cours d'actuariat à l'UQAM." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analysons d'abord l'architecture de cette base de données présentée ci-dessous:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Base de données d'assurance dans le cadre du cours ACT3035](https://raw.githubusercontent.com/nmeraihi/data/master/bdSQL//diagramm.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit bien que nous avons essentiellement quatre tables avec lesquelles nous allons travailler;\n", "1. Tout d'abord, la table `donnes_demo` dans laquelle nous trouvons les informations démographiques des assurés. Cette table contient une **clé primaire** (colonne \"numeropol\") qui est le numéro de police de l'assuré. C'est avec cette colonne (attribut dans le langage `sql`) qui permet de relier les quatre tables entre elles.\n", "2. Ensuite, nous avons la table `police_assurance` qui nous donne l'information sur chaque police d'assurance\n", "3. La table `pmt` nous donne l'information sur les primes payées par les assurés, comme le montant, le mode de paiement ...etc.\n", "4. Enfin, la table `cars_ifo` nous donne l'information sur les véhicules au dossier de l'assuré." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analysons l'aperçu sur chacune des tables;" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Apperçu de la table donnes_demo

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nameprovincecompanylanguedate_naissanceageeage_permisnumeropol
Shane RobinsonNova ScotiaMay Ltdfr1944-10-2072241
Courtney NguyenSaskatchewanFoley, Moore and Mitchellen1985-12-0931245
Lori WashingtonYukon TerritoryRobinson-Reyesfr1970-01-27472813
Sarah CastilloAlbertaWood, Brady and Englishfr2000-08-23161616
Jeffrey GarciaNunavutBerger-Thompsonen1969-10-25472022
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "libname bdSQL \"data/bdsql\";\n", "title \"Apperçu de la table donnes_demo\";\n", "proc sql outobs=5;\n", " select * from bdSQL.donnes_demo\n", "quit;" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Apperçu de la table police_assurance

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropoldebut_polfin_polcout1cout2cout3cout4cout5cout6cout7nbsin
11999-11-102000-10-16.......0
12000-10-172000-11-09.......0
12000-11-102001-11-09243.85714286......1
51996-01-031996-03-27.......0
51996-03-281997-01-02.......0
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Apperçu de la table police_assurance\";\n", "proc sql outobs=5;\n", " select * from bdSQL.police_assurance\n", "quit;" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Apperçu de la table pmt

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropolcout_primecredit_card_numbercredit_card_providercredit_card_expirefreq_pmt
11060.284.4274761E15Voyager01APR23:00:00:0012
51200.895.3033891E15JCB 16 digit01AUG26:00:00:001
13940.543.5285692E15Maestro01AUG22:00:00:0012
16860.756.0115698E15VISA 13 digit01MAR23:00:00:001
22790.175.2624946E15Maestro01AUG20:00:00:001
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Apperçu de la table pmt\";\n", "proc sql outobs=5;\n", " select * from bdSQL.pmt\n", "quit;" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Apperçu de la table cars_info

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
numeropolmarque_voiturecouleur_voiturepresence_alarmelicense_plate
1AutresAutre0DW 3168
5RENAULTAutre0926 1RL
13RENAULTAutre1SOV 828
16HONDAAutre0ENSK 514
22VOLKSWAGENAutre1453 CFM
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Apperçu de la table cars_info\";\n", "proc sql outobs=5;\n", " select * from bdSQL.cars_info\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maintenant que nous avons eu connaissance de notre base de données, on peut remarquer que la table `police_assurance` peut contenir plusieurs informations pour un même assuré, alors que les autres tables en contiennent qu'une seule ligne par assuré. En effet, les tables peuvent avoir différentes relations qu'on appelle:\n", "* On to one\n", "* On to many\n", "* many to many \n", "\n", "Étudions plus en détail les deux premières" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## On To One" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lorsque nous avons deux tables où il existe une seule observation pour un id donné, nous avons alors une relation de type **One to one**. Cette situation s'illustre bien avec les deux tables `donnes_demo`, et `pmt`. Dans les deux tables nous avons une seule observation par id.. Les deux tables peuvent être relié par le id de l'assuré. \n", "\n", "Si nous cherchons par exemple le fournisseur de carte de crédit de chaque assuré;" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Type de carte de crédit pour chaque assuré

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
namecredit_card_provider
Shane RobinsonVoyager
Courtney NguyenJCB 16 digit
Lori WashingtonMaestro
Sarah CastilloVISA 13 digit
Jeffrey GarciaMaestro
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Type de carte de crédit pour chaque assuré\";\n", "proc sql outobs=5;\n", " select a.name, b.credit_card_provider\n", " from bdSQL.donnes_demo as a left join bdSQL.pmt as b\n", " on a.numeropol = b.numeropol\n", " ; \n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Si nous avions voulu faire cette opération avec un _data step_, nous aurons fait ceci:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Type de carte de crédit pour chaque assuré

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
namecredit_card_provider
Shane RobinsonVoyager
Courtney NguyenJCB 16 digit
Lori WashingtonMaestro
Sarah CastilloVISA 13 digit
Jeffrey GarciaMaestro
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sort data = bdSQL.donnes_demo;\n", " by numeropol;\n", "run;\n", "\n", "proc sort data = bdSQL.pmt;\n", " by numeropol;\n", "run;\n", "\n", "data bdSQL.Data_step_one2one ;\n", " merge bdSQL.donnes_demo(in = left) bdSQL.pmt(in = right);\n", " by numeropol;\n", " if left;\n", " keep name credit_card_provider;\n", "run;\n", "proc print data=bdSQL.Data_step_one2one (obs=5) noobs;\n", "run;\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On remarque que c'est beaucoup plus simple d'utiliser les procédures `sql` quand il s'agit de telles situations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## One To Many" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lorsque nous avons une table avec un seul identifiant pour chaque observation (assuré), et une autre table où chaque assuré peut avoir plus qu'une observation (donc un même id qui se répète sur plus qu'une ligne), nous somme alors en présence de relation entre deux tables de type **On To Many**. Par exemple;" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Nombte de sinistre total par assuré

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
namenombreSin
Betty Scott11
Dustin Banks10
Angela Smith9
Jimmy Harrison8
Aaron Tucker8
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Nombte de sinistre total par assuré\";\n", "proc sql outobs=5;\n", " select a.name, sum(b.nbsin) as nombreSin\n", " from bdSQL.donnes_demo as a left join bdSQL.police_assurance as b\n", " on a.numeropol = b.numeropol\n", " group by a.name\n", " order by nombreSin desc\n", " ; \n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Si nous cherchons la liste des noms de tous les assurés et le nombre de sinistres total qu'ils ont eus. Il faut noter que nous cherchons le total du nombre de sinistres de chaque assuré sachant qu'un assuré donné peut avoir zéro ou plus qu'un sinistre.\n", "\n", "Remarquez que cela aurait pu bien fonctionner en utilisant `right` à la place de `left`" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Nombte de sinistre total par assuré

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
namenombreSin
Betty Scott11
Dustin Banks10
Angela Smith9
Jimmy Harrison8
Aaron Tucker8
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Nombte de sinistre total par assuré\";\n", "proc sql outobs=5;\n", " select a.name, sum(b.nbsin) as nombreSin\n", " from bdSQL.donnes_demo as a right join bdSQL.police_assurance as b\n", " on a.numeropol = b.numeropol\n", " group by a.name\n", " order by nombreSin desc\n", " ; \n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ou " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Nombte de sinistre total par assuré

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
namenombreSin
Betty Scott11
Dustin Banks10
Angela Smith9
Jimmy Harrison8
Aaron Tucker8
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "title \"Nombre de sinistre total par assuré\";\n", "proc sql outobs=5;\n", " select b.name, sum(a.nbsin) as nombreSin\n", " from bdSQL.police_assurance as a right join bdSQL.donnes_demo as b\n", " on a.numeropol = b.numeropol\n", " group by b.name\n", " order by nombreSin desc\n", " ; \n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jointure de plus que deux table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imaginons que nous voulons créer une table qui nous donne: le nombre de sinistre par province, en plus d'avoir l'information s'il ya présence d'alarme ou pas (o ou 1)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSin
Alberta0527
Alberta1481
British Columbia0483
British Columbia1501
Manitoba0435
Manitoba1444
New Brunswick0996
New Brunswick1991
Newfoundland and Labrador0526
Newfoundland and Labrador1484
Northwest Territories0464
Northwest Territories1458
Nova Scotia0457
Nova Scotia1442
Nunavut0426
Nunavut1517
Ontario0523
Ontario1428
Prince Edward Island0459
Prince Edward Island1430
Quebec0431
Quebec1425
Saskatchewan0506
Saskatchewan1461
Yukon Territory0489
Yukon Territory1473
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "/* petit exercice */\n", "title;\n", "proc sql ;\n", " create table bdSQL.prov_alar_sinis as\n", " select a.province, c.presence_alarme, sum(b.nbsin) as numbSin\n", " from bdSQL.donnes_demo a, bdSQL.police_assurance b , bdSQL.cars_info c\n", " where a.numeropol=b.numeropol and a.numeropol=c.numeropol\n", " group by province, presence_alarme\n", " ;\n", " \n", "proc sql;\n", " select * from bdSQL.prov_alar_sinis;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mettre à jour une table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Avec les procédures `proc sql`, ils aussi possible de modifier les tables telles que l'ajout d'observations, en supprimer d'autres ...etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Supposons que nous créerons une nouvelle table où nous avons le nombre total de sinistres, le coût total par province, pour les assurés ayant une alarme ou pas pour chacune des provinces." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTot
Alberta0527$1,810,648.26
Alberta1481$2,159,837.70
British Columbia0483$1,845,870.63
British Columbia1501$2,251,772.80
Manitoba0435$1,816,059.04
Manitoba1444$2,052,799.22
New Brunswick0996$4,063,876.55
New Brunswick1991$4,910,546.77
Newfoundland and Labrador0526$2,114,873.41
Newfoundland and Labrador1484$2,572,403.03
Northwest Territories0464$1,831,869.64
Northwest Territories1458$2,303,249.95
Nova Scotia0457$1,863,869.16
Nova Scotia1442$2,140,654.70
Nunavut0426$1,767,940.35
Nunavut1517$2,299,574.47
Ontario0523$1,869,489.94
Ontario1428$2,198,819.46
Prince Edward Island0459$1,823,033.74
Prince Edward Island1430$2,302,004.23
Quebec0431$1,791,020.49
Quebec1425$2,099,646.42
Saskatchewan0506$2,078,543.11
Saskatchewan1461$2,380,597.45
Yukon Territory0489$2,090,626.75
Yukon Territory1473$2,271,767.28
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "/* petit exercice */\n", "title;\n", "proc sql ;\n", " create table bdSQL.prov_alar_sinis as\n", " select a.province, c.presence_alarme, sum(b.nbsin) as numbSin, sum(d.cout_prime) as coutTot format=dollar15.2\n", " from bdSQL.donnes_demo a, bdSQL.police_assurance b , bdSQL.cars_info c, bdSQL.pmt d\n", " where a.numeropol=b.numeropol and a.numeropol=c.numeropol and a.numeropol=d.numeropol\n", " group by province, presence_alarme\n", " ;\n", " \n", "proc sql;\n", " select * from bdSQL.prov_alar_sinis;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maintenant, supposons que nous voulons augmenter les coûts des primes par 2% pour tous les assurés n'ayant pas une alarme installée sur le véhicule. Mais d'abord, créons une copie de la table `prov_alar_sinis` qu'on appellera `prov_alar_sinis2` afin d'y apporter de modifications" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTot
Alberta0527$1,810,648.26
Alberta1481$2,159,837.70
British Columbia0483$1,845,870.63
British Columbia1501$2,251,772.80
Manitoba0435$1,816,059.04
Manitoba1444$2,052,799.22
New Brunswick0996$4,063,876.55
New Brunswick1991$4,910,546.77
Newfoundland and Labrador0526$2,114,873.41
Newfoundland and Labrador1484$2,572,403.03
Northwest Territories0464$1,831,869.64
Northwest Territories1458$2,303,249.95
Nova Scotia0457$1,863,869.16
Nova Scotia1442$2,140,654.70
Nunavut0426$1,767,940.35
Nunavut1517$2,299,574.47
Ontario0523$1,869,489.94
Ontario1428$2,198,819.46
Prince Edward Island0459$1,823,033.74
Prince Edward Island1430$2,302,004.23
Quebec0431$1,791,020.49
Quebec1425$2,099,646.42
Saskatchewan0506$2,078,543.11
Saskatchewan1461$2,380,597.45
Yukon Territory0489$2,090,626.75
Yukon Territory1473$2,271,767.28
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " create table bdSQL.prov_alar_sinis2 like bdSQL.prov_alar_sinis;\n", " insert into bdSQL.prov_alar_sinis2\n", " select * from bdSQL.prov_alar_sinis;\n", " select * from bdSQL.prov_alar_sinis2;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Appliquons maintenant l'augmentation de 2%" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTot
Alberta0527$1,846,861.23
Alberta1481$2,159,837.70
British Columbia0483$1,882,788.04
British Columbia1501$2,251,772.80
Manitoba0435$1,852,380.22
Manitoba1444$2,052,799.22
New Brunswick0996$4,145,154.08
New Brunswick1991$4,910,546.77
Newfoundland and Labrador0526$2,157,170.88
Newfoundland and Labrador1484$2,572,403.03
Northwest Territories0464$1,868,507.03
Northwest Territories1458$2,303,249.95
Nova Scotia0457$1,901,146.54
Nova Scotia1442$2,140,654.70
Nunavut0426$1,803,299.16
Nunavut1517$2,299,574.47
Ontario0523$1,906,879.74
Ontario1428$2,198,819.46
Prince Edward Island0459$1,859,494.41
Prince Edward Island1430$2,302,004.23
Quebec0431$1,826,840.90
Quebec1425$2,099,646.42
Saskatchewan0506$2,120,113.97
Saskatchewan1461$2,380,597.45
Yukon Territory0489$2,132,439.28
Yukon Territory1473$2,271,767.28
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " update bdSQL.prov_alar_sinis2\n", " set coutTot=coutTot*1.02\n", " where presence_alarme=0;\n", "\n", " title \"Selectively Updated presence_alarme Values\";\n", " select *\n", " from bdSQL.prov_alar_sinis2;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Supprimer des observations\n", "\n", "Afin d’en faire un exemple seulement, sans aucune pensée politique quelconque, supposons que nous voulons séparer les données sur le Qéubec du reste des autres provinces. Donc, nous aurons deux tables, celle qui contient les informations sur les Québec, et l’autre contient les informations des autres provinces." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTot
Quebec0431$1,826,840.90
Quebec1425$2,099,646.42
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " create table bdSQL.Quebec like bdSQL.prov_alar_sinis;\n", " insert into bdSQL.Quebec\n", " select * from bdSQL.prov_alar_sinis\n", " where province like 'Q%';\n", " select * from bdSQL.Quebec;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Supprimons maintenant les données du Québec de notre table principale" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTot
Alberta0527$1,846,861.23
Alberta1481$2,159,837.70
British Columbia0483$1,882,788.04
British Columbia1501$2,251,772.80
Manitoba0435$1,852,380.22
Manitoba1444$2,052,799.22
New Brunswick0996$4,145,154.08
New Brunswick1991$4,910,546.77
Newfoundland and Labrador0526$2,157,170.88
Newfoundland and Labrador1484$2,572,403.03
Northwest Territories0464$1,868,507.03
Northwest Territories1458$2,303,249.95
Nova Scotia0457$1,901,146.54
Nova Scotia1442$2,140,654.70
Nunavut0426$1,803,299.16
Nunavut1517$2,299,574.47
Ontario0523$1,906,879.74
Ontario1428$2,198,819.46
Prince Edward Island0459$1,859,494.41
Prince Edward Island1430$2,302,004.23
Saskatchewan0506$2,120,113.97
Saskatchewan1461$2,380,597.45
Yukon Territory0489$2,132,439.28
Yukon Territory1473$2,271,767.28
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " delete\n", " from bdSQL.prov_alar_sinis\n", " where province like 'Q%';\n", " select * from bdSQL.prov_alar_sinis;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ajout d'une colonne\n", "\n", "Supposons qu'on veut ajouter une colonne qui serait simplement le coût par sinistre:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTotCoût par sinistre
Alberta0527$1,846,861.23$3,504.48
Alberta1481$2,159,837.70$4,490.31
British Columbia0483$1,882,788.04$3,898.11
British Columbia1501$2,251,772.80$4,494.56
Manitoba0435$1,852,380.22$4,258.35
Manitoba1444$2,052,799.22$4,623.42
New Brunswick0996$4,145,154.08$4,161.80
New Brunswick1991$4,910,546.77$4,955.14
Newfoundland and Labrador0526$2,157,170.88$4,101.09
Newfoundland and Labrador1484$2,572,403.03$5,314.88
Northwest Territories0464$1,868,507.03$4,026.95
Northwest Territories1458$2,303,249.95$5,028.93
Nova Scotia0457$1,901,146.54$4,160.06
Nova Scotia1442$2,140,654.70$4,843.11
Nunavut0426$1,803,299.16$4,233.10
Nunavut1517$2,299,574.47$4,447.92
Ontario0523$1,906,879.74$3,646.04
Ontario1428$2,198,819.46$5,137.43
Prince Edward Island0459$1,859,494.41$4,051.19
Prince Edward Island1430$2,302,004.23$5,353.50
Saskatchewan0506$2,120,113.97$4,189.95
Saskatchewan1461$2,380,597.45$5,163.99
Yukon Territory0489$2,132,439.28$4,360.82
Yukon Territory1473$2,271,767.28$4,802.89
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " alter table bdSQL.prov_alar_sinis\n", " add cout_par_sinsn num label='Coût par sinistre' format=dollar15.2;\n", " update bdSQL.prov_alar_sinis\n", " set cout_par_sinsn=coutTot/numbSin;\n", " select *\n", " from bdSQL.prov_alar_sinis; \n", "quit; " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On peut aussi modifier le format, le _label_ ou la largeur d'une colonne avec l'option `MODIFY`. Par exemple, notre nouvelle colonne `cout_par_sinsn` qui avait un format `dollar=15.2`, on peut réduire le format à `dollar=19.0`" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTotCoût par sinistre
Alberta0527$1,846,861.23$3,504
Alberta1481$2,159,837.70$4,490
British Columbia0483$1,882,788.04$3,898
British Columbia1501$2,251,772.80$4,495
Manitoba0435$1,852,380.22$4,258
Manitoba1444$2,052,799.22$4,623
New Brunswick0996$4,145,154.08$4,162
New Brunswick1991$4,910,546.77$4,955
Newfoundland and Labrador0526$2,157,170.88$4,101
Newfoundland and Labrador1484$2,572,403.03$5,315
Northwest Territories0464$1,868,507.03$4,027
Northwest Territories1458$2,303,249.95$5,029
Nova Scotia0457$1,901,146.54$4,160
Nova Scotia1442$2,140,654.70$4,843
Nunavut0426$1,803,299.16$4,233
Nunavut1517$2,299,574.47$4,448
Ontario0523$1,906,879.74$3,646
Ontario1428$2,198,819.46$5,137
Prince Edward Island0459$1,859,494.41$4,051
Prince Edward Island1430$2,302,004.23$5,353
Saskatchewan0506$2,120,113.97$4,190
Saskatchewan1461$2,380,597.45$5,164
Yukon Territory0489$2,132,439.28$4,361
Yukon Territory1473$2,271,767.28$4,803
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " alter table bdSQL.prov_alar_sinis\n", " modify cout_par_sinsn format=dollar10.0;\n", " select * from bdSQL.prov_alar_sinis; \n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Supprimer une colonne:\n", "La clause `DROP` permet de supprimer une colonne d'une table donnée;" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Selectively Updated presence_alarme Values

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
provincepresence_alarmenumbSincoutTot
Alberta0527$1,846,861.23
Alberta1481$2,159,837.70
British Columbia0483$1,882,788.04
British Columbia1501$2,251,772.80
Manitoba0435$1,852,380.22
Manitoba1444$2,052,799.22
New Brunswick0996$4,145,154.08
New Brunswick1991$4,910,546.77
Newfoundland and Labrador0526$2,157,170.88
Newfoundland and Labrador1484$2,572,403.03
Northwest Territories0464$1,868,507.03
Northwest Territories1458$2,303,249.95
Nova Scotia0457$1,901,146.54
Nova Scotia1442$2,140,654.70
Nunavut0426$1,803,299.16
Nunavut1517$2,299,574.47
Ontario0523$1,906,879.74
Ontario1428$2,198,819.46
Prince Edward Island0459$1,859,494.41
Prince Edward Island1430$2,302,004.23
Saskatchewan0506$2,120,113.97
Saskatchewan1461$2,380,597.45
Yukon Territory0489$2,132,439.28
Yukon Territory1473$2,271,767.28
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " alter table bdSQL.prov_alar_sinis\n", " drop cout_par_sinsn;\n", " select * from bdSQL.prov_alar_sinis;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Supprimer une table!\n", "Même s'il faut se poser cette question plusieurs fois avant d’exécuter la commande sql, on peut toujours supprimer complètement une table comme ceci: " ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
615  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg; ods graphics on /
615! outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
616
617 proc sql;
618 drop table bdSQL.Quebec;
NOTE: Table BDSQL.QUEBEC has been dropped.
619
620 ods html5 (id=saspy_internal) close;ods listing;

621
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " drop table bdSQL.Quebec;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Autres procédures SQL" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compter le nombre de doublons:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compter les doublons est une des premières tâches que vous aurez à faire lorsque vous allez manipuler des bases de données un peu plus importantes\n", "\n", "Soit la table `doublons`" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Duplicate Rows in Duplicates Table

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idNom
123Noureddine M
124Gabriel A
125JM P
126Mathieu P
127Mathieu B
126Mathieu P
128JP B
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " create table bdSQL.doublons\n", " (id char(12), Nom char(12));\n", " insert into bdSQL.doublons\n", " values(\"123\", \"Noureddine Meraihi\")\n", " values(\"124\", \"Gabriel A\")\n", " values(\"125\", \"JM P\")\n", " values(\"126\", \"Mathieu P\")\n", " values(\"127\", \"Mathieu B\")\n", " values(\"126\", \"Mathieu P\")\n", " values(\"128\", \"JP B\");\n", " select * from bdSQL.doublons;\n", "quit;" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Duplicate Rows in Duplicates Table

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idNomCount
126Mathieu P2
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql;\n", " title 'Duplicate Rows in Duplicates Table';\n", " select *, count(*) as Count\n", " from bdSQL.doublons\n", " group by id, Nom\n", " having count(*) > 1;\n", "quit;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ou on peut aussi utiliser la fonction `freq` afin de trouver le nombre de récurrences " ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Duplicate Rows in Duplicates Table

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NomFrequency
Gabriel A1
JM P1
JP B1
Mathieu B1
Mathieu P2
Noureddine M1
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc freq\n", " data = bdSQL.doublons;\n", " tables Nom / nocum nocol norow nopercent\n", " ;\n", "run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PROC rank" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cette procédure est très intéressante si vous voulez trouver le rang d'une observation par variable. Dans l'exemple suivant, nous cherchons le rang de chaque candidat aux élections. D'abord, par le nombre de votes. Ensuite, par année." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Results of City Council Election

\n", "
\n", "
\n", "
\n", "

District=1

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsCandidateVoteYearsVoteRankYearsRank
1Cardella1689811
2Latham1005232
3Smith1406023
4Walker846043
N = 4 
\n", "
\n", "
\n", "
\n", "
\n", "

District=2

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsCandidateVoteYearsVoteRankYearsRank
5Hinkley912033
6Kreitemeyer1198023
7Lundell2447611
8Thrash912232
N = 4 
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data elect;\n", " input Candidate $ 1-11 District 13 Vote 15-18 Years 20;\n", " datalines;\n", "Cardella 1 1689 8\n", "Latham 1 1005 2\n", "Smith 1 1406 0\n", "Walker 1 846 0\n", "Hinkley 2 912 0\n", "Kreitemeyer 2 1198 0\n", "Lundell 2 2447 6\n", "Thrash 2 912 2\n", ";\n", "proc rank data=elect out=results ties=low descending;\n", " by district;\n", " var vote years;\n", " ranks VoteRank YearsRank;\n", "run;\n", "proc print data=results n;\n", " by district;\n", " title 'Results of City Council Election';\n", "run; " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare procedure\n", "\n", "La procédure `Compare` est aussi intéressante lorsque vous voulez comparer les différences deux variables se trouvant dans deux tables différentes. En voici un exemple où l'on veut comparer la variable `coutTot` dans les deux tables `prov_alar_sinis` et `prov_alar_sinis2`. On se rappelle que dans la deuxième, nous avons augmenté les coûts de primes par 2%, seulement pour les véhicules qui ne sont pas munis d'un système d'alarme." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Comparison of Variables in Different Data Sets

\n", "
\n", "
\n", "
\n",
       "                             The COMPARE Procedure                              \n",
       "        Comparison of BDSQL.PROV_ALAR_SINIS2 with BDSQL.PROV_ALAR_SINIS         \n",
       "                                 (Method=EXACT)                                 \n",
       "                                                                                \n",
       "NOTE: Values of the following 1 variables compare unequal: coutTot^=coutTot     \n",
       "
\n", "
\n", "
\n", "
\n",
       "                                                                                \n",
       "                                                                                \n",
       "                     Value Comparison Results for Variables                     \n",
       "                                                                                \n",
       "           __________________________________________________________           \n",
       "                      ||       Base    Compare                                  \n",
       "                  Obs ||    coutTot    coutTot      Diff.     % Diff            \n",
       "            ________  ||  _________  _________  _________  _________            \n",
       "                      ||                                                        \n",
       "                   1  ||  1846861.2  1810648.3     -36213    -1.9608            \n",
       "                   3  ||  1882788.0  1845870.6     -36917    -1.9608            \n",
       "                   5  ||  1852380.2  1816059.0     -36321    -1.9608            \n",
       "                   7  ||  4145154.1  4063876.5     -81278    -1.9608            \n",
       "                   9  ||  2157170.9  2114873.4     -42297    -1.9608            \n",
       "                  11  ||  1868507.0  1831869.6     -36637    -1.9608            \n",
       "                  13  ||  1901146.5  1863869.2     -37277    -1.9608            \n",
       "                  15  ||  1803299.2  1767940.4     -35359    -1.9608            \n",
       "                  17  ||  1906879.7  1869489.9     -37390    -1.9608            \n",
       "                  19  ||  1859494.4  1823033.7     -36461    -1.9608            \n",
       "                  21  ||  1826840.9  1791020.5     -35820    -1.9608            \n",
       "                  23  ||  2120114.0  2078543.1     -41571    -1.9608            \n",
       "                  25  ||  2132439.3  2090626.7     -41813    -1.9608            \n",
       "           __________________________________________________________           \n",
       "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "options nodate pageno=1 linesize=80 pagesize=40;\n", "proc compare base=bdSQL.prov_alar_sinis2 compare=bdSQL.prov_alar_sinis nosummary;\n", " var coutTot;\n", " with coutTot;\n", " title 'Comparison of Variables in Different Data Sets';\n", "run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TRANSPOSE Procedure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La procédure `Transpose` peut être très utile lorsque nous voulons transposer nos variables en observation et vice versa. Dans l'exemple suivant, nous avons une liste d'étudiants (observation) avec les notes qu'ils ont eues à trois examens (donc trois variables). On voudrait trois observations (trois lignes) qui représentent Test1, Test2 et Test3. Et sept variables (les 7 étudiants)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Données oriiginales

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
StudentStudentIDSectionTest1Test2Final
Capalleti05451949187
Dubose12522516591
Engles11671959797
Grant12302637580
Krupski25272807671
Lundsford48601924086
McBane06741757872
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

Student Test Scores in Variables

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
_NAME_COL1COL2COL3COL4COL5COL6COL7
Test194519563809275
Test291659775764078
Final87919780718672
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options nodate pageno=1 linesize=80 pagesize=40;\n", "data score;\n", " input Student $9. StudentID $ Section $ Test1 Test2 Final;\n", " datalines;\n", "Capalleti 0545 1 94 91 87\n", "Dubose 1252 2 51 65 91\n", "Engles 1167 1 95 97 97\n", "Grant 1230 2 63 75 80\n", "Krupski 2527 2 80 76 71\n", "Lundsford 4860 1 92 40 86\n", "McBane 0674 1 75 78 72\n", ";\n", "proc print data=score noobs;\n", " title 'Données oriiginales';\n", "run;\n", "\n", "proc transpose data=score out=score_transposed;\n", "run;\n", "\n", "proc print data=score_transposed noobs;\n", " title 'Student Test Scores in Variables';\n", "run;\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "271.463px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }