{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extraction d'un sous-ensemble de données avec la poc data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dans ce chapitre, nous verrons comment manipuler un ou plusieurs ensembles de données. Nous allons extraire un sous-ensemble de donné à partir d'une base de données provenant d'une bibliothèque temporaire ou permanente. Enfin, nous verrons comment fusionner deux tables ayant des noms de variables pareils ou différents ainsi que des formats pareils ou différents."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"67942 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"52484 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"27817 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"32986 | \n",
"Technicien | \n",
"Femme | \n",
"4 | \n",
"15282.23 | \n",
"
\n",
"\n",
"45769 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"proc format;\n",
" value $HF 'M'='Homme'\n",
" 'F'='Femme';\n",
"run;\n",
"proc print data=assu.cours_3 (firstobs = 1 obs = 5) noobs;\n",
"run;"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"67942 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"2 | \n",
"52484 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"27817 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"32986 | \n",
"Technicien | \n",
"Femme | \n",
"4 | \n",
"15282.23 | \n",
"
\n",
"\n",
"5 | \n",
"45769 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"libname assu 'data';\n",
"Data assu.cours_3;\n",
" set assu.eq_2;\n",
" cout_total=sum(cout1,cout2, cout3, cout4, cout5, cout6, cout7);\n",
" keep numeropol\ttype_prof\tsexe nbsin cout_total; \n",
" format sexe HF.;\n",
"run;\n",
"proc print data=assu.cours_3 (firstobs = 1 obs = 5);\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"D'abord créons le format `$HF` où nous remplaçons le sexe `M` par `Homme` et le sexe `F` par `Femme`\n",
"\n",
"Imaginons que nous voulons faire une étude sur les sinistres causés par les conducteurs masculins dans notre base de données. Nous voudrions alors séparer notre base de données en deux, et ce, selon le sexe de l'assuré."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"67942 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"2 | \n",
"52484 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"27817 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"45769 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"60045 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data hommes;\n",
" set assu.cours_3;\n",
" where sexe = 'M';\n",
"run;\n",
"\n",
"title \"BD des hommes seulement\";\n",
"proc print data=hommes (firstobs = 1 obs = 5) ;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On voit bien que l'assurée du sexe féminin de la table `cours_3` (la quatrième observation) est maintenant absente de la nouvelle table `hommes`\n",
"\n",
"Remarquez qu'on aurait pu créer deux ensembles de données en un seul bloc de code. Une BD pour les hommes et une autre pour les femmes en utilisant la condition `IF`"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"32986 | \n",
"Technicien | \n",
"Femme | \n",
"4 | \n",
"15282.23 | \n",
"
\n",
"\n",
"2 | \n",
"22914 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"76422 | \n",
"Infirmière | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"15847 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"42487 | \n",
"Technicien | \n",
"Femme | \n",
"1 | \n",
"165.98 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"67942 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"2 | \n",
"52484 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"27817 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"45769 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"60045 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data hommes femmes;\n",
" set assu.cours_3;\n",
" if sexe = 'F' then output femmes;\n",
" else if sexe = 'M' then output hommes;\n",
"run;\n",
"title 'BD des femmes';\n",
"proc print data=femmes (firstobs = 1 obs = 5);\n",
"run;\n",
"title 'BD des hommes';\n",
"proc print data=hommes (firstobs = 1 obs = 5);\n",
"run;\n",
"title;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Avez-vous remarqué que nous appliquons un `if` sur la variable sexe `='M'` et non `='Homme'`?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fusion des ensembles de données"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Maintenant que nous savons comment scinder une base de données, regardons comment nous pouvons les fusionner (ou les joindre)."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"67942 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"2 | \n",
"52484 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"27817 | \n",
"Ingénieur | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"45769 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"60045 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"28308 | \n",
"52 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28309 | \n",
"89 | \n",
"Informaticien | \n",
"Femme | \n",
"3 | \n",
"33914.83 | \n",
"
\n",
"\n",
"28310 | \n",
"90 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28311 | \n",
"1166 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28312 | \n",
"4530 | \n",
"Technicien | \n",
"Femme | \n",
"1 | \n",
"38.48 | \n",
"
\n",
"\n",
"28313 | \n",
"9130 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data homme_femmes;\n",
" set hommes femmes;\n",
"run;\n",
"\n",
"title 'ensembles fusionnés, apperçu du début';\n",
"proc print data=homme_femmes (firstobs = 1 obs = 5);\n",
"run;\n",
"title 'ensembles fusionnés, apperçu de la fin de la bd';\n",
"proc print data=homme_femmes (firstobs = 28308 obs = 28313);\n",
"run;\n",
"title;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Imaginons maintenant que la variable `numeropol` de notre table des `femmes_2` est plutôt nommée `idPolice` tel qu'illustré ci-dessous."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"idPolice | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"32986 | \n",
"Technicien | \n",
"Femme | \n",
"4 | \n",
"15282.23 | \n",
"
\n",
"\n",
"2 | \n",
"22914 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"76422 | \n",
"Infirmière | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"15847 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"42487 | \n",
"Technicien | \n",
"Femme | \n",
"1 | \n",
"165.98 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"proc print data=assu.femmes_2 (firstobs = 1 obs = 5);\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Et nous voulons fusionner la table d’hommes avec celle des femmes_2 afin d'avoir qu'une seule table. Nous avons deux choix, \n",
"1. Renommer la variable `numeropol` par `idPolice` dans la table des hommes.\n",
"2. ou renommer la variable `idPolice` par `numeropol` dans la table des femmes_2\n",
"\n",
"En utilisant `merge` qui nécessite de faire un **tri** d'abord."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"46 | \n",
"Technicien | \n",
"Homme | \n",
"1 | \n",
"856.248 | \n",
"
\n",
"\n",
"2 | \n",
"46 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"3 | \n",
"52 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"4 | \n",
"52 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"5 | \n",
"63 | \n",
"Technicien | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"numeropol | \n",
"\n",
"\n",
"nbsin | \n",
"cout_total | \n",
"
\n",
"\n",
"\n",
"\n",
"28161 | \n",
"88997 | \n",
"Infirmière | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28162 | \n",
"88998 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28163 | \n",
"89003 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28164 | \n",
"89011 | \n",
"Infirmière | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28165 | \n",
"89028 | \n",
"Infirmière | \n",
"Homme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"28166 | \n",
"89033 | \n",
"Technicien | \n",
"Femme | \n",
"0 | \n",
". | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"/* on applique d'abord le sort */\n",
"proc sort data=hommes;\n",
" by numeropol;\n",
"run;\n",
"\n",
"proc sort data=assu.femmes_2;\n",
" by idPolice;\n",
"run;\n",
"\n",
"/* Ensuite on fusionne */\n",
"\n",
"data merge_hommes_et_femmmes_2;\n",
" merge hommes\n",
" assu.femmes_2(rename=(idPolice = numeropol));\n",
" by numeropol;\n",
"run;\n",
"\n",
"/* Enfin un apperçu dy résultat */\n",
"proc print data=merge_hommes_et_femmmes_2 (firstobs = 1 obs = 5);\n",
"run;\n",
"proc print data=merge_hommes_et_femmmes_2 (firstobs = 28161 obs = 28166);\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Il arrive des fois que les noms de variable sont les mêmes, mais que le format de ces dernières est différent d'une table à une autre qu'on veuille fusionner. Regardons un exemple pour mieux comprendre.\n",
"\n",
"Soit la petite table `NAS_1` qui contient des numéros d'assurance sociale au format `123456789`"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"Obs | \n",
"SS | \n",
"DOB | \n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"1 | \n",
"332211009 | \n",
"11/14/1993 | \n",
"M | \n",
"
\n",
"\n",
"2 | \n",
"123456789 | \n",
"05/17/1983 | \n",
"F | \n",
"
\n",
"\n",
"3 | \n",
"987789321 | \n",
"04/01/1991 | \n",
"F | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data NAS_1;\n",
" input SS: 9.\n",
" DOB: MMDDYY10. \n",
" Gender : $1.;\n",
" FORMAT DOB MMDDYY10.;\n",
" DATALINES;\n",
" 332211009 11/14/1993 M\n",
" 123456789 05/17/1983 F\n",
" 987789321 04/01/1991 F\n",
" ;\n",
"proc print data=NAS_1;\n",
"run; "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"soit la table `NAS_2` qui contient d'autres informations sur les individus de la table `NAS_1` mais que cette fois, le format de la variable `SS` est `123-45-6789`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"Salary | \n",
"
\n",
"\n",
"\n",
"\n",
"332-21-1009 | \n",
"A10 | \n",
"45123 | \n",
"
\n",
"\n",
"123-45-6789 | \n",
"B5 | \n",
"35400 | \n",
"
\n",
"\n",
"987-78-9321 | \n",
"A20 | \n",
"87900 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data NAS_2;\n",
" input SS: $11.\n",
" JobCode: $13. \n",
" Salary : 9.;\n",
" DATALINES;\n",
" 332-21-1009 A10 45123\n",
" 123-45-6789 B5 35400\n",
" 987-78-9321 A20 87900\n",
" ; \n",
"\n",
"proc print data=NAS_2 noobs;\n",
"run;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous créons une nouvelle table appelée `NAS_1B` qui est une réplique de la table `NAS_1` dans laquelle nous changeons le format de la variable `SS` sous le même format que la table `NAS_2`"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"SAS Output\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"SS | \n",
"DOB | \n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"332211009 | \n",
"11/14/1993 | \n",
"M | \n",
"
\n",
"\n",
"123456789 | \n",
"05/17/1983 | \n",
"F | \n",
"
\n",
"\n",
"987789321 | \n",
"04/01/1991 | \n",
"F | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"DOB | \n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"05/17/1983 | \n",
"F | \n",
"123-45-6789 | \n",
"
\n",
"\n",
"11/14/1993 | \n",
"M | \n",
"332-21-1009 | \n",
"
\n",
"\n",
"04/01/1991 | \n",
"F | \n",
"987-78-9321 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"Salary | \n",
"
\n",
"\n",
"\n",
"\n",
"123-45-6789 | \n",
"B5 | \n",
"35400 | \n",
"
\n",
"\n",
"332-21-1009 | \n",
"A10 | \n",
"45123 | \n",
"
\n",
"\n",
"987-78-9321 | \n",
"A20 | \n",
"87900 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"DOB | \n",
"\n",
"\n",
"\n",
"Salary | \n",
"
\n",
"\n",
"\n",
"\n",
"05/17/1983 | \n",
"F | \n",
"123-45-6789 | \n",
"B5 | \n",
"35400 | \n",
"
\n",
"\n",
"11/14/1993 | \n",
"M | \n",
"332-21-1009 | \n",
"A10 | \n",
"45123 | \n",
"
\n",
"\n",
"04/01/1991 | \n",
"F | \n",
"987-78-9321 | \n",
"A20 | \n",
"87900 | \n",
"
\n",
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data NAS_1B;\n",
" set NAS_1(rename=(SS = NumSS));\n",
" SS = put(NumSS,ssn11.); /*ssn c'est un formt de NAS*/\n",
" drop NumSS;\n",
"run;\n",
"/* on applique d'abord le sort */\n",
"proc sort data=NAS_1B;\n",
" by SS;\n",
"run;\n",
"\n",
"proc sort data=NAS_2;\n",
" by SS;\n",
"run;\n",
"\n",
"/* Ensuite on fusionne */\n",
"data les_deux_NAS;\n",
" merge NAS_1B NAS_2;\n",
" by SS;\n",
"run;\n",
"title 'NAS_1';\n",
"proc print data=NAS_1 noobs;\n",
"run;\n",
"title 'NAS_1B';\n",
"proc print data=NAS_1B noobs;\n",
"run;\n",
"title 'NAS_2';\n",
"proc print data=NAS_2 noobs;\n",
"run;\n",
"title 'les_deux_NAS';\n",
"proc print data=les_deux_NAS noobs;\n",
"run;\n",
"title;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dans le prochain cours, nous verrons qu'il est beaucoup plus simple d'appliquer ces concepts en utilisant la procédure `PROC sql` qui ne nécessite pas de tri préalable comme la fonction `merge` que nous venons de voir."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Données manquantes\n",
"\n",
"Lorsqu'une valeur de donnée est inconnue, alors nous avons une donnée manquante. \n",
"Une donnée manquante est une donnée valide, elle est affichée par un espace (vide) dans une variable (colonne) de type caractère, et par un point lorsqu'il s'agit d'une variable (colonne) numérique.\n",
"\n",
"# Nommer des variables dans SAS\n",
"\n",
"SAS possède une convention afin de donner des noms de variables. Nous pouvons nommer nos variables avec une combinaison de 1 à 32 caractères, numériques, _underscore_ ou une combinaison de ces trois derniers. Les lettres peuvent être en minuscule ou majuscule."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 4
}