In the gapminder
dataset calculate the range of
population, gdp, and life expectancy
round(max(gapminder$lifeExp) - min(gapminder$lifeExp),1)
## [1] 59
round(max(gapminder$pop) - min(gapminder$pop),1)
## [1] 1318623085
round(max(gapminder$pop) - min(gapminder$pop),1)
## [1] 1318623085
Make a function for calculating range
max_minus_min <- function(x){
round((max(x) - min(x)), 1)
}
Test-run your function
max_minus_min(gapminder$lifeExp)
## [1] 59
purrr
packageApply class()
function to each column of gampminder
data
gapminder %>%
map(class)
## $country
## [1] "factor"
##
## $continent
## [1] "factor"
##
## $year
## [1] "integer"
##
## $lifeExp
## [1] "numeric"
##
## $pop
## [1] "integer"
##
## $gdpPercap
## [1] "numeric"
The default output of map()
is a list. If you want a
character vector as ouput use map_chr()
gapminder %>%
map_chr(class)
## country continent year lifeExp pop gdpPercap
## "factor" "factor" "integer" "numeric" "integer" "numeric"
Another example, What is the number of distinct values in each
column? Hint: use n_distinct()
gapminder %>%
map_int(n_distinct) # using map_int for integer output
## country continent year lifeExp pop gdpPercap
## 142 5 12 1626 1704 1704
What is the median of all numeric columns?
gapminder %>%
dplyr::select_if(is.numeric) %>%
map_dbl(median)
## year lifeExp pop gdpPercap
## 1979.5000 60.7125 7023595.5000 3531.8470
~
with map()
~ helps reduce the amount of typing when you want to pass complex
functions through map()
Example
my_vector <- c(1, 2, 3)
map_dbl(my_vector, function(x){x+10})
## [1] 11 12 13
Shortcut of the same code using ~
my_vector <- c(1, 2, 3)
map_dbl(my_vector, ~(.+10))
## [1] 11 12 13
More complex example:
Fitting a linear model with different groups of the data
gapminder %>%
split(.$continent) %>% # split dataset by continent
map(function(df) lm(lifeExp ~ pop, data = df)) # linear model for each group
## $Africa
##
## Call:
## lm(formula = lifeExp ~ pop, data = df)
##
## Coefficients:
## (Intercept) pop
## 4.816e+01 7.150e-08
##
##
## $Americas
##
## Call:
## lm(formula = lifeExp ~ pop, data = df)
##
## Coefficients:
## (Intercept) pop
## 6.353e+01 4.587e-08
##
##
## $Asia
##
## Call:
## lm(formula = lifeExp ~ pop, data = df)
##
## Coefficients:
## (Intercept) pop
## 5.992e+01 1.901e-09
##
##
## $Europe
##
## Call:
## lm(formula = lifeExp ~ pop, data = df)
##
## Coefficients:
## (Intercept) pop
## 7.162e+01 1.650e-08
##
##
## $Oceania
##
## Call:
## lm(formula = lifeExp ~ pop, data = df)
##
## Coefficients:
## (Intercept) pop
## 7.207e+01 2.545e-07
Shortcut of the same code using ~
gapminder %>%
split(.$continent) %>% # split dataset by continent
map(~lm(lifeExp ~ pop, data = .)) # linear model for each group
## $Africa
##
## Call:
## lm(formula = lifeExp ~ pop, data = .)
##
## Coefficients:
## (Intercept) pop
## 4.816e+01 7.150e-08
##
##
## $Americas
##
## Call:
## lm(formula = lifeExp ~ pop, data = .)
##
## Coefficients:
## (Intercept) pop
## 6.353e+01 4.587e-08
##
##
## $Asia
##
## Call:
## lm(formula = lifeExp ~ pop, data = .)
##
## Coefficients:
## (Intercept) pop
## 5.992e+01 1.901e-09
##
##
## $Europe
##
## Call:
## lm(formula = lifeExp ~ pop, data = .)
##
## Coefficients:
## (Intercept) pop
## 7.162e+01 1.650e-08
##
##
## $Oceania
##
## Call:
## lm(formula = lifeExp ~ pop, data = .)
##
## Coefficients:
## (Intercept) pop
## 7.207e+01 2.545e-07
purrr()
Download the data here - Google drive link
This is data from diffrent countries. “_gm” suffix is for gapminder from where data is borrowed. In the example below, the files in a folder called “data”
Step 1 and 2: Make a list of all .csv
files with _gm
suffix
my_files <- dir(here("data"), # specify file path
pattern = "*_gm.csv", # look for .csv files with _gm suffix
full.names = TRUE) # preserve file path
my_files
## [1] "/Users/meenakshikushwaha/Dropbox/R projects/github/CSTEP_R_course/data/china_gm.csv"
## [2] "/Users/meenakshikushwaha/Dropbox/R projects/github/CSTEP_R_course/data/india_gm.csv"
## [3] "/Users/meenakshikushwaha/Dropbox/R projects/github/CSTEP_R_course/data/japan_gm.csv"
## [4] "/Users/meenakshikushwaha/Dropbox/R projects/github/CSTEP_R_course/data/nepal_gm.csv"
Step 3: Read and combine all files using map_dfr()
my_df <- my_files %>%
map_dfr(read_csv)
my_df
## # A tibble: 16 × 6
## country continent year lifeExp pop gdpPercap
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 China Asia 1952 44 556263527 400.
## 2 China Asia 1957 50.5 637408000 576.
## 3 China Asia 1962 44.5 665770000 488.
## 4 China Asia 1967 58.4 754550000 613.
## 5 India Asia 1952 37.4 372000000 547.
## 6 India Asia 1957 40.2 409000000 590.
## 7 India Asia 1962 43.6 454000000 658.
## 8 India Asia 1967 47.2 506000000 701.
## 9 Japan Asia 1952 63.0 86459025 3217.
## 10 Japan Asia 1957 65.5 91563009 4318.
## 11 Japan Asia 1962 68.7 95831757 6577.
## 12 Japan Asia 1967 71.4 100825279 9848.
## 13 Nepal Asia 1952 36.2 9182536 546.
## 14 Nepal Asia 1957 37.7 9682338 598.
## 15 Nepal Asia 1962 39.4 10332057 652.
## 16 Nepal Asia 1967 41.5 11261690 676.