{ "cells": [ { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "options(repr.matrix.max.cols=8, repr.matrix.max.rows=5)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "path<-\"https://raw.githubusercontent.com/nmeraihi/data/master/\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Question 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importer les données `qc_hommes_2.csv` à partir du répertoire [data github](https://github.com/nmeraihi/data) dans un _data frame_ df" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "df<-read.csv(paste(path,\"qc_hommes_2.csv\",sep = \"\"), sep=\",\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
agelx
0 an 100000
1 an 99501
2 ans 99483
3 ans 99467
4 ans 99454
5 ans 99442
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " age & lx\\\\\n", "\\hline\n", "\t 0 an & 100000\\\\\n", "\t 1 an & 99501\\\\\n", "\t 2 ans & 99483\\\\\n", "\t 3 ans & 99467\\\\\n", "\t 4 ans & 99454\\\\\n", "\t 5 ans & 99442\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "age | lx | \n", "|---|---|---|---|---|---|\n", "| 0 an | 100000 | \n", "| 1 an | 99501 | \n", "| 2 ans | 99483 | \n", "| 3 ans | 99467 | \n", "| 4 ans | 99454 | \n", "| 5 ans | 99442 | \n", "\n", "\n" ], "text/plain": [ " age lx \n", "1 0 an 100000\n", "2 1 an 99501\n", "3 2 ans 99483\n", "4 3 ans 99467\n", "5 4 ans 99454\n", "6 5 ans 99442" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(df)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
agelx
106105 ans 96
107106 ans 51
108107 ans 26
109108 ans 13
110109 ans 6
111110 ans et plus 3
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " & age & lx\\\\\n", "\\hline\n", "\t106 & 105 ans & 96 \\\\\n", "\t107 & 106 ans & 51 \\\\\n", "\t108 & 107 ans & 26 \\\\\n", "\t109 & 108 ans & 13 \\\\\n", "\t110 & 109 ans & 6 \\\\\n", "\t111 & 110 ans et plus & 3 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | age | lx | \n", "|---|---|---|---|---|---|\n", "| 106 | 105 ans | 96 | \n", "| 107 | 106 ans | 51 | \n", "| 108 | 107 ans | 26 | \n", "| 109 | 108 ans | 13 | \n", "| 110 | 109 ans | 6 | \n", "| 111 | 110 ans et plus | 3 | \n", "\n", "\n" ], "text/plain": [ " age lx\n", "106 105 ans 96\n", "107 106 ans 51\n", "108 107 ans 26\n", "109 108 ans 13\n", "110 109 ans 6\n", "111 110 ans et plus 3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tail(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dans la colonne `age`, garder seulement la partie numérique. Vous devriez alors obtenir age={0,1,2 ...}" ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "tags": [] }, "outputs": [], "source": [ "df$age<-gsub(\"ans\", \"\", df$age)\n", "df$age<-gsub(\"an\", \"\", df$age)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "À ce df, ajouter une nouvelle colonne `dx` (nombre de décès entre l'âge x et x+n). Donc dx est le nombre de décès qui surviennent dans chaque intervalle d'âge au sein d'une cohorte initiale de 100 000 naissances vivantes à l'âge 0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$d_x=l_x -l_{x+1}$$" ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "a<-df[-nrow(df), 2]-df[-1, 2]\n", "a<-c(a, a[length(a)])\n", "df$dx<-a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculer qx (quotient de mortalité entre l'âge x et x+n). Donc qx est probabilité qu'un individu d'âge x décède avant d'atteindre l'âge x+n." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$q_x=\\frac{d_x}{l_x}$$" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "df$qx<-round(df$dx/df$lx,5)" ] }, { "cell_type": "code", "execution_count": 156, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
agelxdxqx
0 100000 499 0.00499
1 99501 18 0.00018
2 99483 16 0.00016
3 99467 13 0.00013
4 99454 12 0.00012
5 99442 11 0.00011
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " age & lx & dx & qx\\\\\n", "\\hline\n", "\t 0 & 100000 & 499 & 0.00499\\\\\n", "\t 1 & 99501 & 18 & 0.00018\\\\\n", "\t 2 & 99483 & 16 & 0.00016\\\\\n", "\t 3 & 99467 & 13 & 0.00013\\\\\n", "\t 4 & 99454 & 12 & 0.00012\\\\\n", "\t 5 & 99442 & 11 & 0.00011\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "age | lx | dx | qx | \n", "|---|---|---|---|---|---|\n", "| 0 | 100000 | 499 | 0.00499 | \n", "| 1 | 99501 | 18 | 0.00018 | \n", "| 2 | 99483 | 16 | 0.00016 | \n", "| 3 | 99467 | 13 | 0.00013 | \n", "| 4 | 99454 | 12 | 0.00012 | \n", "| 5 | 99442 | 11 | 0.00011 | \n", "\n", "\n" ], "text/plain": [ " age lx dx qx \n", "1 0 100000 499 0.00499\n", "2 1 99501 18 0.00018\n", "3 2 99483 16 0.00016\n", "4 3 99467 13 0.00013\n", "5 4 99454 12 0.00012\n", "6 5 99442 11 0.00011" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## e)\n", "Maintenant que vous avez toutes les données, on peut calculer la probabilité qu'un individu d'âge x survive jusqu'à l'âge x+n." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$tP_x=\\frac{l_{x+t}}{l_x}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculer la probabilité qu'un individu de 22 ans survive les trois prochaines années" ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [ { "data": { "text/html": [ "0.998192831903079" ], "text/latex": [ "0.998192831903079" ], "text/markdown": [ "0.998192831903079" ], "text/plain": [ "[1] 0.9981928" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "age<-22\n", "t<-3\n", "p<-df[age+1+t,2]/df[age+1,2]\n", "p" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "library(formattable)" ] }, { "cell_type": "code", "execution_count": 161, "metadata": {}, "outputs": [ { "data": { "text/html": [ "99.82%" ], "text/latex": [ "99.82\\%" ], "text/markdown": [ "99.82%" ], "text/plain": [ "[1] 99.82%" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "percent(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Question 2" ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "Id=c(1,2,3,4)\n", "Age=c(14,12,15,10)\n", "Sex=c('F','M','M','F')\n", "Code=c('a','b','c','d')\n", "df1=data.frame(Id,Age)\n", "df2=data.frame(Id,Sex,Code)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Avec les données suivantes;" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
IdAge
1 14
2 12
3 15
4 10
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Id & Age\\\\\n", "\\hline\n", "\t 1 & 14\\\\\n", "\t 2 & 12\\\\\n", "\t 3 & 15\\\\\n", "\t 4 & 10\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Id | Age | \n", "|---|---|---|---|\n", "| 1 | 14 | \n", "| 2 | 12 | \n", "| 3 | 15 | \n", "| 4 | 10 | \n", "\n", "\n" ], "text/plain": [ " Id Age\n", "1 1 14 \n", "2 2 12 \n", "3 3 15 \n", "4 4 10 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
IdSexCode
1Fa
2Mb
3Mc
4Fd
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Id & Sex & Code\\\\\n", "\\hline\n", "\t 1 & F & a\\\\\n", "\t 2 & M & b\\\\\n", "\t 3 & M & c\\\\\n", "\t 4 & F & d\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Id | Sex | Code | \n", "|---|---|---|---|\n", "| 1 | F | a | \n", "| 2 | M | b | \n", "| 3 | M | c | \n", "| 4 | F | d | \n", "\n", "\n" ], "text/plain": [ " Id Sex Code\n", "1 1 F a \n", "2 2 M b \n", "3 3 M c \n", "4 4 F d " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Créer un _data frame_ `M` qui fait une jointure de `df1` et `df2`" ] }, { "cell_type": "code", "execution_count": 165, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
IdAgeSexCode
1 14F a
2 12M b
3 15M c
4 10F d
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Id & Age & Sex & Code\\\\\n", "\\hline\n", "\t 1 & 14 & F & a \\\\\n", "\t 2 & 12 & M & b \\\\\n", "\t 3 & 15 & M & c \\\\\n", "\t 4 & 10 & F & d \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Id | Age | Sex | Code | \n", "|---|---|---|---|\n", "| 1 | 14 | F | a | \n", "| 2 | 12 | M | b | \n", "| 3 | 15 | M | c | \n", "| 4 | 10 | F | d | \n", "\n", "\n" ], "text/plain": [ " Id Age Sex Code\n", "1 1 14 F a \n", "2 2 12 M b \n", "3 3 15 M c \n", "4 4 10 F d " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "M=merge(df1,df2,by='Id')\n", "M" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Question 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Selon un [journaliste de la CNBC](https://www.cnbc.com/video/3000418698), le prix de l'action de Apple [(AAPL)](https://ca.finance.yahoo.com/quote/AAPL/history?p=AAPL) est très corrélé avec le prix de l'action de [Boeing Co (BA)](https://ca.finance.yahoo.com/quote/BA/history?p=BA). \n", "\n", "Calculer la corrélation des prix Adj Close **mensuels** de ces deux compagnies sur la période allant du 2016-11-01 au 2017-10-01.\n", "\n", "**Indice:** créer deux vecteur avec les valeurs des prix. Vous pouvez importer les données à partir de [finance yahoo](https://ca.finance.yahoo.com/) dans la section _Historical Data_ avec les dates et périodes indiquées ci-haut." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "path<-\"https://raw.githubusercontent.com/nmeraihi/data/master/\"" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "df_app <-read.csv(paste(path,\"AAPL_month.csv\",sep = \"\"), header = T)" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "df_ba <-read.csv(paste(path,\"BA_month.csv\",sep = \"\"), header = T)" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "a<-cbind(df_app$Adj.Close,df_ba$Adj.Close)" ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "colnames(a)<-c(\"Apple\", \"Boeing\")" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "rownames(a)<-seq(as.Date(\"2016/11/1\"), by = \"month\", length.out = 12)" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
AppleBoeing
Apple1.0000000.872264
Boeing0.8722641.000000
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " & Apple & Boeing\\\\\n", "\\hline\n", "\tApple & 1.000000 & 0.872264\\\\\n", "\tBoeing & 0.872264 & 1.000000\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Apple | Boeing | \n", "|---|---|\n", "| Apple | 1.000000 | 0.872264 | \n", "| Boeing | 0.872264 | 1.000000 | \n", "\n", "\n" ], "text/plain": [ " Apple Boeing \n", "Apple 1.000000 0.872264\n", "Boeing 0.872264 1.000000" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cor(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Question 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Créer un _data frame_ avec les données [HackerRank-Developer-Survey](https://raw.githubusercontent.com/nmeraihi/data/master/HackerRank-Developer-Survey-2018-Values.csv). Dans ces données, sont une série de réponse que les développeurs de [HackerRank](https://www.hackerrank.com/) ont répondu suite à un sondage ayant pour but de comprendre les l'intérêt des femmes envers l'informatique." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "library(dplyr, warn.conflicts = F)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "values <- read.csv(paste(path,\"HackerRank-Developer-Survey-2018-Values.csv\",sep = \"\"), header = T, stringsAsFactors = F)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
RespondentIDStartDateEndDateCountryNumeric2q1AgeBeginCodingq2Ageq3Genderq4Educationq0004_otherq5DegreeFocusq30LearnCodeOtherq0030_otherq31Level3q32RecommendHackerRankq0032_otherq33HackerRankChallforJobq34PositiveExpq34IdealLengHackerRankTestq0035_otherq36Level4
6464453728 10/19/17 11:51 10/20/17 12:05 South Korea 16 - 20 years old 18 - 24 years old Female Some college Computer Science Other (please specify) datacamp num%2 == 0 Yes No NA #NULL! Queue
6478031510 10/26/17 6:18 10/26/17 7:49 Ukraine 16 - 20 years old 25 - 34 years old Male Post graduate degree (Masters, PhD) Other STEM (science, technology, engineering, math) num%2 == 0 Yes No NA #NULL! Queue
6464392829 10/19/17 10:44 10/19/17 10:56 Malaysia 11 - 15 years old 12 - 18 years old Female Some college Other STEM (science, technology, engineering, math) num%2 == 0 Yes No NA #NULL! Queue
6481629912 10/27/17 1:51 10/27/17 2:05 Curaçao 11 - 15 years old 12 - 18 years old Male College graduate Computer Science num%2 == 0 Yes No NA #NULL! Hashmap
6488385057 10/31/17 11:46 10/31/17 11:59 16 - 20 years old 25 - 34 years old Female College graduate Other (please specify) Blogs/articles by industry leaders num%2 == 0 Yes No NA #NULL! Hashmap
6463843138 10/19/17 3:02 10/19/17 3:18 United States 41 - 50 years old 35 - 44 years old Male College graduate Computer Science Other (please specify) SoloLearn num%2 == 0 Yes No NA #NULL! Queue
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n", " RespondentID & StartDate & EndDate & CountryNumeric2 & q1AgeBeginCoding & q2Age & q3Gender & q4Education & q0004\\_other & q5DegreeFocus & ⋯ & q30LearnCodeOther & q0030\\_other & q31Level3 & q32RecommendHackerRank & q0032\\_other & q33HackerRankChallforJob & q34PositiveExp & q34IdealLengHackerRankTest & q0035\\_other & q36Level4\\\\\n", "\\hline\n", "\t 6464453728 & 10/19/17 11:51 & 10/20/17 12:05 & South Korea & 16 - 20 years old & 18 - 24 years old & Female & Some college & & Computer Science & ⋯ & Other (please specify) & datacamp & num\\%2 == 0 & Yes & & No & NA & \\#NULL! & & Queue \\\\\n", "\t 6478031510 & 10/26/17 6:18 & 10/26/17 7:49 & Ukraine & 16 - 20 years old & 25 - 34 years old & Male & Post graduate degree (Masters, PhD) & & Other STEM (science, technology, engineering, math) & ⋯ & & & num\\%2 == 0 & Yes & & No & NA & \\#NULL! & & Queue \\\\\n", "\t 6464392829 & 10/19/17 10:44 & 10/19/17 10:56 & Malaysia & 11 - 15 years old & 12 - 18 years old & Female & Some college & & Other STEM (science, technology, engineering, math) & ⋯ & & & num\\%2 == 0 & Yes & & No & NA & \\#NULL! & & Queue \\\\\n", "\t 6481629912 & 10/27/17 1:51 & 10/27/17 2:05 & Curaçao & 11 - 15 years old & 12 - 18 years old & Male & College graduate & & Computer Science & ⋯ & & & num\\%2 == 0 & Yes & & No & NA & \\#NULL! & & Hashmap \\\\\n", "\t 6488385057 & 10/31/17 11:46 & 10/31/17 11:59 & & 16 - 20 years old & 25 - 34 years old & Female & College graduate & & & ⋯ & Other (please specify) & Blogs/articles by industry leaders & num\\%2 == 0 & Yes & & No & NA & \\#NULL! & & Hashmap \\\\\n", "\t 6463843138 & 10/19/17 3:02 & 10/19/17 3:18 & United States & 41 - 50 years old & 35 - 44 years old & Male & College graduate & & Computer Science & ⋯ & Other (please specify) & SoloLearn & num\\%2 == 0 & Yes & & No & NA & \\#NULL! & & Queue \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RespondentID | StartDate | EndDate | CountryNumeric2 | q1AgeBeginCoding | q2Age | q3Gender | q4Education | q0004_other | q5DegreeFocus | ⋯ | q30LearnCodeOther | q0030_other | q31Level3 | q32RecommendHackerRank | q0032_other | q33HackerRankChallforJob | q34PositiveExp | q34IdealLengHackerRankTest | q0035_other | q36Level4 | \n", "|---|---|---|---|---|---|\n", "| 6464453728 | 10/19/17 11:51 | 10/20/17 12:05 | South Korea | 16 - 20 years old | 18 - 24 years old | Female | Some college | | Computer Science | ⋯ | Other (please specify) | datacamp | num%2 == 0 | Yes | | No | NA | #NULL! | | Queue | \n", "| 6478031510 | 10/26/17 6:18 | 10/26/17 7:49 | Ukraine | 16 - 20 years old | 25 - 34 years old | Male | Post graduate degree (Masters, PhD) | | Other STEM (science, technology, engineering, math) | ⋯ | | | num%2 == 0 | Yes | | No | NA | #NULL! | | Queue | \n", "| 6464392829 | 10/19/17 10:44 | 10/19/17 10:56 | Malaysia | 11 - 15 years old | 12 - 18 years old | Female | Some college | | Other STEM (science, technology, engineering, math) | ⋯ | | | num%2 == 0 | Yes | | No | NA | #NULL! | | Queue | \n", "| 6481629912 | 10/27/17 1:51 | 10/27/17 2:05 | Curaçao | 11 - 15 years old | 12 - 18 years old | Male | College graduate | | Computer Science | ⋯ | | | num%2 == 0 | Yes | | No | NA | #NULL! | | Hashmap | \n", "| 6488385057 | 10/31/17 11:46 | 10/31/17 11:59 | | 16 - 20 years old | 25 - 34 years old | Female | College graduate | | | ⋯ | Other (please specify) | Blogs/articles by industry leaders | num%2 == 0 | Yes | | No | NA | #NULL! | | Hashmap | \n", "| 6463843138 | 10/19/17 3:02 | 10/19/17 3:18 | United States | 41 - 50 years old | 35 - 44 years old | Male | College graduate | | Computer Science | ⋯ | Other (please specify) | SoloLearn | num%2 == 0 | Yes | | No | NA | #NULL! | | Queue | \n", "\n", "\n" ], "text/plain": [ " RespondentID StartDate EndDate CountryNumeric2 q1AgeBeginCoding \n", "1 6464453728 10/19/17 11:51 10/20/17 12:05 South Korea 16 - 20 years old\n", "2 6478031510 10/26/17 6:18 10/26/17 7:49 Ukraine 16 - 20 years old\n", "3 6464392829 10/19/17 10:44 10/19/17 10:56 Malaysia 11 - 15 years old\n", "4 6481629912 10/27/17 1:51 10/27/17 2:05 Curaçao 11 - 15 years old\n", "5 6488385057 10/31/17 11:46 10/31/17 11:59 16 - 20 years old\n", "6 6463843138 10/19/17 3:02 10/19/17 3:18 United States 41 - 50 years old\n", " q2Age q3Gender q4Education q0004_other\n", "1 18 - 24 years old Female Some college \n", "2 25 - 34 years old Male Post graduate degree (Masters, PhD) \n", "3 12 - 18 years old Female Some college \n", "4 12 - 18 years old Male College graduate \n", "5 25 - 34 years old Female College graduate \n", "6 35 - 44 years old Male College graduate \n", " q5DegreeFocus ⋯ q30LearnCodeOther \n", "1 Computer Science ⋯ Other (please specify)\n", "2 Other STEM (science, technology, engineering, math) ⋯ \n", "3 Other STEM (science, technology, engineering, math) ⋯ \n", "4 Computer Science ⋯ \n", "5 ⋯ Other (please specify)\n", "6 Computer Science ⋯ Other (please specify)\n", " q0030_other q31Level3 q32RecommendHackerRank\n", "1 datacamp num%2 == 0 Yes \n", "2 num%2 == 0 Yes \n", "3 num%2 == 0 Yes \n", "4 num%2 == 0 Yes \n", "5 Blogs/articles by industry leaders num%2 == 0 Yes \n", "6 SoloLearn num%2 == 0 Yes \n", " q0032_other q33HackerRankChallforJob q34PositiveExp\n", "1 No NA \n", "2 No NA \n", "3 No NA \n", "4 No NA \n", "5 No NA \n", "6 No NA \n", " q34IdealLengHackerRankTest q0035_other q36Level4\n", "1 #NULL! Queue \n", "2 #NULL! Queue \n", "3 #NULL! Queue \n", "4 #NULL! Hashmap \n", "5 #NULL! Hashmap \n", "6 #NULL! Queue " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En utilisanrtt le package `dplyr`, faites un petit tableau qui donne la proportion des hommes et des femmes dans ce _dataset_. \n", "\n", "Utilisez la variable `q3Gender`" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "values_2<-values %>% \n", " group_by(q3Gender) %>%\n", " filter(q3Gender %in% c('Male','Female'))%>% \n", " count()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
q3Gendern
Female 16.55688
Male 83.44312
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " q3Gender & n\\\\\n", "\\hline\n", "\t Female & 16.55688\\\\\n", "\t Male & 83.44312\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "q3Gender | n | \n", "|---|---|\n", "| Female | 16.55688 | \n", "| Male | 83.44312 | \n", "\n", "\n" ], "text/plain": [ " q3Gender n \n", "1 Female 16.55688\n", "2 Male 83.44312" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "values_2$n<-(values_2$n/ sum(values_2$n)) * 100\n", "values_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En utilisanrtt le package `dplyr`, faites un tableau qui donne la proportion des hommes et des femmes en les séparant par le fait qu'ils soient étudiants ou non. \n", "\n", "Utilisez les variables `q3Gender`, `is_student` et `q8Student`" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "values$is_student <- ifelse(values$q8Student == '','Yes','No')" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
q3Genderis_studentn
Female No 20.82685
Female Yes 13.55364
Male No 79.17315
Male Yes 86.44636
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " q3Gender & is\\_student & n\\\\\n", "\\hline\n", "\t Female & No & 20.82685\\\\\n", "\t Female & Yes & 13.55364\\\\\n", "\t Male & No & 79.17315\\\\\n", "\t Male & Yes & 86.44636\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "q3Gender | is_student | n | \n", "|---|---|---|---|\n", "| Female | No | 20.82685 | \n", "| Female | Yes | 13.55364 | \n", "| Male | No | 79.17315 | \n", "| Male | Yes | 86.44636 | \n", "\n", "\n" ], "text/plain": [ " q3Gender is_student n \n", "1 Female No 20.82685\n", "2 Female Yes 13.55364\n", "3 Male No 79.17315\n", "4 Male Yes 86.44636" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "values %>% group_by(q3Gender, is_student) %>% \n", " filter(q3Gender %in% c('Male','Female')) %>% \n", " count() %>% \n", " ungroup() %>% \n", " group_by(is_student) %>% \n", " mutate(n = (n/ sum(n)) * 100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dressez un tableau qui donne le nombre de répondants par pays (utilisez la variable `CountryNumeric2`)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
CountryNumeric2n
3991
Afghanistan 3
Albania 8
Algeria 22
American Samoa 1
Andorra 1
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " CountryNumeric2 & n\\\\\n", "\\hline\n", "\t & 3991 \\\\\n", "\t Afghanistan & 3 \\\\\n", "\t Albania & 8 \\\\\n", "\t Algeria & 22 \\\\\n", "\t American Samoa & 1 \\\\\n", "\t Andorra & 1 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "CountryNumeric2 | n | \n", "|---|---|---|---|---|---|\n", "| | 3991 | \n", "| Afghanistan | 3 | \n", "| Albania | 8 | \n", "| Algeria | 22 | \n", "| American Samoa | 1 | \n", "| Andorra | 1 | \n", "\n", "\n" ], "text/plain": [ " CountryNumeric2 n \n", "1 3991\n", "2 Afghanistan 3\n", "3 Albania 8\n", "4 Algeria 22\n", "5 American Samoa 1\n", "6 Andorra 1" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "values %>% group_by(CountryNumeric2) %>% count() %>% head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Faites un tableau qui donne le nombre de répondants en les classant par le diplôme obtenu. \n", "\n", "Utilisez la variable `q4Education`" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
q4EducationTotal
College graduate 12010
Post graduate degree (Masters, PhD) 6030
Some college 2499
Some post graduate work (Masters, PhD) 2493
High school graduate 1289
Some high school 316
#NULL! 305
Vocational training (like bootcamp) 148
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " q4Education & Total\\\\\n", "\\hline\n", "\t College graduate & 12010 \\\\\n", "\t Post graduate degree (Masters, PhD) & 6030 \\\\\n", "\t Some college & 2499 \\\\\n", "\t Some post graduate work (Masters, PhD) & 2493 \\\\\n", "\t High school graduate & 1289 \\\\\n", "\t Some high school & 316 \\\\\n", "\t \\#NULL! & 305 \\\\\n", "\t Vocational training (like bootcamp) & 148 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "q4Education | Total | \n", "|---|---|---|---|---|---|---|---|\n", "| College graduate | 12010 | \n", "| Post graduate degree (Masters, PhD) | 6030 | \n", "| Some college | 2499 | \n", "| Some post graduate work (Masters, PhD) | 2493 | \n", "| High school graduate | 1289 | \n", "| Some high school | 316 | \n", "| #NULL! | 305 | \n", "| Vocational training (like bootcamp) | 148 | \n", "\n", "\n" ], "text/plain": [ " q4Education Total\n", "1 College graduate 12010\n", "2 Post graduate degree (Masters, PhD) 6030\n", "3 Some college 2499\n", "4 Some post graduate work (Masters, PhD) 2493\n", "5 High school graduate 1289\n", "6 Some high school 316\n", "7 #NULL! 305\n", "8 Vocational training (like bootcamp) 148" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "values %>%\n", " filter(!is.na(q4Education))%>%\n", " group_by(q4Education)%>%\n", " summarise(Total = n())%>%\n", " arrange(desc(Total)) %>%\n", " mutate(q4Education = reorder(q4Education,Total)) %>%\n", " head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Faites un tableau qui donne le nombre de développeurs par catégorie d'âge.\n", "\n", "Utilisez la variable `q1AgeBeginCoding`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
q1AgeBeginCodingTotal
16 - 20 years old 14293
11 - 15 years old 5264
21 - 25 years old 3626
5 - 10 years old 933
26 - 30 years old 642
31 - 35 years old 193
36 - 40 years old 67
41 - 50 years old 34
#NULL! 30
50+ years or older 8
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " q1AgeBeginCoding & Total\\\\\n", "\\hline\n", "\t 16 - 20 years old & 14293 \\\\\n", "\t 11 - 15 years old & 5264 \\\\\n", "\t 21 - 25 years old & 3626 \\\\\n", "\t 5 - 10 years old & 933 \\\\\n", "\t 26 - 30 years old & 642 \\\\\n", "\t 31 - 35 years old & 193 \\\\\n", "\t 36 - 40 years old & 67 \\\\\n", "\t 41 - 50 years old & 34 \\\\\n", "\t \\#NULL! & 30 \\\\\n", "\t 50+ years or older & 8 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "q1AgeBeginCoding | Total | \n", "|---|---|---|---|---|---|---|---|---|---|\n", "| 16 - 20 years old | 14293 | \n", "| 11 - 15 years old | 5264 | \n", "| 21 - 25 years old | 3626 | \n", "| 5 - 10 years old | 933 | \n", "| 26 - 30 years old | 642 | \n", "| 31 - 35 years old | 193 | \n", "| 36 - 40 years old | 67 | \n", "| 41 - 50 years old | 34 | \n", "| #NULL! | 30 | \n", "| 50+ years or older | 8 | \n", "\n", "\n" ], "text/plain": [ " q1AgeBeginCoding Total\n", "1 16 - 20 years old 14293\n", "2 11 - 15 years old 5264\n", "3 21 - 25 years old 3626\n", "4 5 - 10 years old 933\n", "5 26 - 30 years old 642\n", "6 31 - 35 years old 193\n", "7 36 - 40 years old 67\n", "8 41 - 50 years old 34\n", "9 #NULL! 30\n", "10 50+ years or older 8" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "values %>%\n", " filter(!is.na(q1AgeBeginCoding)) %>%\n", " group_by(q1AgeBeginCoding) %>%\n", " summarise(Total = n()) %>%\n", " arrange(desc(Total)) %>%\n", " mutate(q1AgeBeginCoding = reorder(q1AgeBeginCoding,Total)) %>%\n", " head(10)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.1.2" }, "latex_envs": { "LaTeX_envs_menu_present": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": { "height": "174px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }