Homework #3 Data aggregation

Homework #3 Data aggregation

Follow the existing homework submission instructions UP TO THE LINK SUBMISSION PART. Submit your links by editing this Google doc -- WARNING: I turned off sharing ~3pm Mon Sept 23 because almost everyone had submitted their links. Contact JB if you need to submit or edit a link.

Start with the Gapminder data as provided in gapminderDataFiveYear.txt.
Use the plyr package to tackle some, not all, tasks from the list below. You decide what to work on but spend ~2 hours on this. Of course you are welcome to do more if you wish.
Write up with R Markdown.
- Include a narrative, written in English prose, i.e. don't just show code and results
- Expose your code, i.e. use echo = FALSE very sparingly
- If you tackle a tricky one, first give a clean account of your data aggregation results. You can describe your pain and suffering at the end.
- Don't use figures! It's an artificial constraint, I know. Why hold back? Focus on data aggregation, we'll talk figures next week and revisit in the next homework. Plus you will get a visceral understanding of how ridiculous it is to do exploration without making lots of figures.
DUE: Before class begins 9:30am Monday September 23.
- Compile into an HTML report. Follow the naming convention. Publish the HTML report on the web somewhere, such as on RPubs. Make the slug follow the naming convention.
- Publish the R Markdown file as a Gist. Follow the naming convention.
- Share your links by editing the Google doc mentioned above.

Tips

Perform a superficial check that data import went OK.
Give informative, short variable names.
Attend to row and column order in reported results.
Don't be satisfied merely with correct results; examine your code and see if it can be simpler, more readable, more general.

Load the Gapminder data and the `plyr` and `xtable` packages

## data import from URL
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
library(plyr)
library(xtable)
str(gDat)

## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ..
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 ..
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

## define a function for converting and printing to HTML table
htmlPrint <- function(x, ...,
                      digits = 0, include.rownames = FALSE) {
  print(xtable(x, digits = digits, ...), type = 'html',
        include.rownames = include.rownames, ...)
  }

I wrote a function htmlPrint() to create an HTML table and print it.

Create your own adventure.

You are welcome to create your own exercises by riffing on what you see below. In fact, that's great -- you won't be copying from your office mate and it emulates real life where you first need to think up interesting questions, then solve them. The easiest sort of variation is to do what I ask below but with a different variable, different summary statistic, different tuning parameter, etc.

Get the maximum and minimum of GDP per capita for all continents in a "wide" format. Easy.

Produce a data.frame with 3 variables: a continent factor, the minimum GDP per capita, the maximum GDP per capita. There will be one row per continent. Sort on either the minimum or the maximum and assess whether the continents rank similarly w/r/t the other extreme. Discuss.

foo <- ddply(gDat, ~ continent, summarize,
             minGdpPercap = min(gdpPercap), maxGdpPercap = max(gdpPercap))
htmlPrint(arrange(foo, minGdpPercap))

continent	minGdpPercap	maxGdpPercap
Africa	241	21951
Asia	331	113523
Europe	974	49357
Americas	1202	42952
Oceania	10040	34435

Get the maximum and minimum of GDP per capita for all continents in a "tall" format. Medium/hard.

Produce a data.frame with 3 variables: a continent factor, a GDP per capita summary statistic, a factor that describes the statistic (i.e. "min" vs "max"). There will be one row per continent * statistic combination. Don't use reshaping -- get this directly with data aggregation.

foo <- ddply(gDat, ~ continent, function(x) {
  gdpPercap <- range(x$gdpPercap)
  return(data.frame(gdpPercap, stat = c("min", "max")))
  })
htmlPrint(foo)

continent	gdpPercap	stat
Africa	241	min
Africa	21951	max
Americas	1202	min
Americas	42952	max
Asia	331	min
Asia	113523	max
Europe	974	min
Europe	49357	max
Oceania	10040	min
Oceania	34435	max

Look at the spread of GDP per capita within the continents. Easy.

Measures of spread (we will discuss in class soon, just try them out!):

standard deviation or variance -- see sd() and var()
median absolute deviation -- see mad()
interquartile range -- see IQR()

Produce a data.frame with one row per continent and variables containing the different measures of spread for GDP/capita. Sort on one of those and assess whether the continents rank similarly w/r/t heterogeneity of GDP/capita by the other measures. Discuss.

foo <- ddply(gDat, ~ continent, summarize,
             sdGdpPercap = sd(gdpPercap), madGdpPercap = mad(gdpPercap),
             iqrGdpPercap = IQR(gdpPercap))
htmlPrint(arrange(foo, sdGdpPercap))

continent	sdGdpPercap	madGdpPercap	iqrGdpPercap
Africa	2828	775	1616
Oceania	6359	6459	8072
Americas	6397	3269	4402
Europe	9355	8846	13248
Asia	14045	2821	7492

## spot checks!
IQR(gDat$gdpPercap[gDat$continent == "Europe"])

[1] 13248

mad(gDat$gdpPercap[gDat$continent == "Asia"])

[1] 2821

If you don't get an error but you are still worried you aren't doing the right computation, pick a few cases in the middle at random and check "by hand".

Compute a trimmed mean of life expectancy for different years. Easy/medium.

Compute the average life expectancy by year, as a warm-up exercise. Then consider an alternative: a trimmed mean. There is a trim argument in the built-in function mean(). Pick a modest trim fraction that is greater than 0. Report your sorted results. Make sure your trim level is well-documented.

jTrim <- 0.2
foo <- ddply(gDat, ~ year, summarize, tMean = mean(lifeExp, trim = jTrim))

## Error: object 'jTrim' not found

htmlPrint(arrange(foo, tMean))

## Error: object 'tMean' not found

Hear ye, hear ye: the trim level for the trimmed means above is 0.2. Let it be known that I did not type that number in; rather it was printed from the value assigned in the previous R chunk.

If you haven't used it yet, you should experiment with the backtick method of including inline R code. Look at the R Markdown file to see how I do this. If I change jTrim to a different value in the R code, my prose will automatically update.

How is life expectancy changing over time on different continents? "Tall" format. Easy/medium.

Get a data.frame with 3 variables: continent, year, and mean or median life expectancy. What are the trends over time? Are they the same or different across the continents? Is this easy to assess from this data.frame?

htmlPrint(ddply(gDat, ~ continent + year,
                summarize, medLifeExp = median(lifeExp)))

continent	year	medLifeExp
Africa	1952	39
Africa	1957	41
Africa	1962	43
Africa	1967	45
Africa	1972	47
Africa	1977	49
Africa	1982	51
Africa	1987	52
Africa	1992	52
Africa	1997	53
Africa	2002	51
Africa	2007	53
Americas	1952	55
Americas	1957	56
Americas	1962	58
Americas	1967	61
Americas	1972	63
Americas	1977	66
Americas	1982	67
Americas	1987	69
Americas	1992	70
Americas	1997	72
Americas	2002	72
Americas	2007	73
Asia	1952	45
Asia	1957	48
Asia	1962	49
Asia	1967	54
Asia	1972	57
Asia	1977	61
Asia	1982	64
Asia	1987	66
Asia	1992	69
Asia	1997	70
Asia	2002	71
Asia	2007	72
Europe	1952	66
Europe	1957	68
Europe	1962	70
Europe	1967	71
Europe	1972	71
Europe	1977	72
Europe	1982	73
Europe	1987	75
Europe	1992	75
Europe	1997	76
Europe	2002	78
Europe	2007	79
Oceania	1952	69
Oceania	1957	70
Oceania	1962	71
Oceania	1967	71
Oceania	1972	72
Oceania	1977	73
Oceania	1982	74
Oceania	1987	75
Oceania	1992	77
Oceania	1997	78
Oceania	2002	80
Oceania	2007	81

How is life expectancy changing over time on different continents? "Wide" format. Hard (or possibly silly).

Get a data.frame with 1 + 5 = 6 variables: year and then one variable per continent, giving mean or median life expectancy. Is there a natural way to do this in plyr without hard-wiring the continent-specific variables and without reshaping an intermediate result? Since I am new-ish to plyr, I don't know yet if this is possible and/or a totally stupid way to go about this. So this one is for people who are happy to play around a bit. Hint: I've made some progress here and advise you look into daply().

foo <- daply(gDat, ~ year + continent,
             summarize, medLifeExp = median(lifeExp))
#str(foo)
foo <- as.data.frame(foo)
#str(foo) ## there's still some goofiness, ie the variables are lists .... hmmmm
htmlPrint(foo, include.rownames = TRUE)

	Africa	Americas	Asia	Europe	Oceania
1952	39	55	45	66	69
1957	41	56	48	68	70
1962	43	58	49	70	71
1967	45	61	54	71	71
1972	47	63	57	71	72
1977	49	66	61	72	73
1982	51	67	64	73	74
1987	52	69	66	75	75
1992	52	70	69	75	77
1997	53	72	70	76	78
2002	51	72	71	78	80
2007	53	73	72	79	81

Arguably, this table is better transposed. So just switch the order of continent and year in the plyr call.

foo <- daply(gDat, ~ continent + year,
             summarize, medLifeExp = median(lifeExp))
htmlPrint(as.data.frame(foo), include.rownames = TRUE)

	1952	1957	1962	1967	1972	1977	1982	1987	1992	1997	2002	2007
Africa	39	41	43	45	47	49	51	52	52	53	51	53
Americas	55	56	58	61	63	66	67	69	70	72	72	73
Asia	45	48	49	54	57	61	64	66	69	70	71	72
Europe	66	68	70	71	71	72	73	75	75	76	78	79
Oceania	69	70	71	71	72	73	74	75	77	78	80	81

Count the number of countries with low life expectancy over time by continent. "Tall" format. Medium/hard.

Compute some measure of worldwide life expectancy -- you decide -- a mean or median or some other quantile or perhaps your current age. The determine how many countries on each continent have a life expectancy less than this benchmark, for each year. Produce a data.frame with 3 variables: continent, year, and a country count. What's the trend over time? Is that trend the same for all continents? How much fun is it to assess this from this tall table?

#(bMark <- quantile(gDat$lifeExp, 0.2))
bMark <- 43 # JB wrote this on her 43rd b'day!
htmlPrint(ddply(gDat, ~ continent + year,
                function(x) c(lowLifeExp = sum(x$lifeExp <= bMark))))

continent	year	lowLifeExp
Africa	1952	43
Africa	1957	34
Africa	1962	28
Africa	1967	20
Africa	1972	13
Africa	1977	10
Africa	1982	7
Africa	1987	4
Africa	1992	5
Africa	1997	6
Africa	2002	4
Africa	2007	6
Americas	1952	5
Americas	1957	2
Americas	1962	0
Americas	1967	0
Americas	1972	0
Americas	1977	0
Americas	1982	0
Americas	1987	0
Americas	1992	0
Americas	1997	0
Americas	2002	0
Americas	2007	0
Asia	1952	12
Asia	1957	11
Asia	1962	5
Asia	1967	3
Asia	1972	3
Asia	1977	2
Asia	1982	1
Asia	1987	1
Asia	1992	1
Asia	1997	1
Asia	2002	1
Asia	2007	0
Europe	1952	0
Europe	1957	0
Europe	1962	0
Europe	1967	0
Europe	1972	0
Europe	1977	0
Europe	1982	0
Europe	1987	0
Europe	1992	0
Europe	1997	0
Europe	2002	0
Europe	2007	0
Oceania	1952	0
Oceania	1957	0
Oceania	1962	0
Oceania	1967	0
Oceania	1972	0
Oceania	1977	0
Oceania	1982	0
Oceania	1987	0
Oceania	1992	0
Oceania	1997	0
Oceania	2002	0
Oceania	2007	0

Compute the proportion of countries with low life expectancy over time by continent. Medium extension of another task.

The continents have very different numbers of countries. Maybe the proportion of "low life expectancy" countries is more relevant than the raw count. Or maybe the final table should report both? Work on that.

## note: bMark set in previous chunk
htmlPrint(ddply(gDat, ~ continent + year, function(x) {
  jCount = sum(x$lifeExp <= bMark)
  c(count = jCount, prop = jCount / nrow(x))
}), digits = c(0, 0, 0, 0, 2))

continent	year	count	prop
Africa	1952	43	0.83
Africa	1957	34	0.65
Africa	1962	28	0.54
Africa	1967	20	0.38
Africa	1972	13	0.25
Africa	1977	10	0.19
Africa	1982	7	0.13
Africa	1987	4	0.08
Africa	1992	5	0.10
Africa	1997	6	0.12
Africa	2002	4	0.08
Africa	2007	6	0.12
Americas	1952	5	0.20
Americas	1957	2	0.08
Americas	1962	0	0.00
Americas	1967	0	0.00
Americas	1972	0	0.00
Americas	1977	0	0.00
Americas	1982	0	0.00
Americas	1987	0	0.00
Americas	1992	0	0.00
Americas	1997	0	0.00
Americas	2002	0	0.00
Americas	2007	0	0.00
Asia	1952	12	0.36
Asia	1957	11	0.33
Asia	1962	5	0.15
Asia	1967	3	0.09
Asia	1972	3	0.09
Asia	1977	2	0.06
Asia	1982	1	0.03
Asia	1987	1	0.03
Asia	1992	1	0.03
Asia	1997	1	0.03
Asia	2002	1	0.03
Asia	2007	0	0.00
Europe	1952	0	0.00
Europe	1957	0	0.00
Europe	1962	0	0.00
Europe	1967	0	0.00
Europe	1972	0	0.00
Europe	1977	0	0.00
Europe	1982	0	0.00
Europe	1987	0	0.00
Europe	1992	0	0.00
Europe	1997	0	0.00
Europe	2002	0	0.00
Europe	2007	0	0.00
Oceania	1952	0	0.00
Oceania	1957	0	0.00
Oceania	1962	0	0.00
Oceania	1967	0	0.00
Oceania	1972	0	0.00
Oceania	1977	0	0.00
Oceania	1982	0	0.00
Oceania	1987	0	0.00
Oceania	1992	0	0.00
Oceania	1997	0	0.00
Oceania	2002	0	0.00
Oceania	2007	0	0.00

Report the absolute and/or relative abundance of countries with low life expectancy over time by continent. "Wide" format. Extension of another task of unknown difficulty / value.

Create a table with one row per year and one column -- at least conceptually -- per continent. In that "column" report the absolute and/or relative abundance of "low life expectancy" countries. Is there a natural way to do this in plyr without hard-wiring the continent-specific variables and without reshaping an intermediate result? Since I am new-ish to plyr, I don't know yet if this is possible and/or a totally stupid way to go about this. So this one is for people who are happy to play around a bit. Hint: I've made some progress here and advise you look into daply(). I ended up skipping straight to character output versus an object appropriate for further computation or graphing. Sometimes it's better to approach these ill-defined tasks from a different angle.

## note: bMark set in a previous chunk

## my first success ... but was still a 3-dim array :(
## that would be useful for plotting, at least after some reshaping, but is not
## so easy on the eyes
# daply(gDat, ~ year + continent, function(x) {
#   jCount <- sum(x$lifeExp < bMark)
#   c(count = jCount, prop = jCount / nrow(x))
# })

foo <- daply(gDat, ~ year + continent, function(x) {
  jCount <- sum(x$lifeExp <= bMark)
  jTotal <- nrow(x)
  jProp <- jCount / jTotal
  return(sprintf("%1.2f (%d/%d)", jProp, jCount, jTotal))
  })
htmlPrint(foo, include.rownames = TRUE)

	Africa	Americas	Asia
1952	0.83 (43/52)	0.20 (5/25)	0.36 (12/33)
1957	0.65 (34/52)	0.08 (2/25)	0.33 (11/33)
1962	0.54 (28/52)	0.00 (0/25)	0.15 (5/33)
1967	0.38 (20/52)	0.00 (0/25)	0.09 (3/33)
1972	0.25 (13/52)	0.00 (0/25)	0.09 (3/33)
1977	0.19 (10/52)	0.00 (0/25)	0.06 (2/33)
1982	0.13 (7/52)	0.00 (0/25)	0.03 (1/33)
1987	0.08 (4/52)	0.00 (0/25)	0.03 (1/33)
1992	0.10 (5/52)	0.00 (0/25)	0.03 (1/33)
1997	0.12 (6/52)	0.00 (0/25)	0.03 (1/33)
2002	0.08 (4/52)	0.00 (0/25)	0.03 (1/33)
2007	0.12 (6/52)	0.00 (0/25)	0.00 (0/33)

Find countries with interesting stories. Open-ended and, therefore, hard. Realistic!

It was easy (see above) to get the minimum life expectancy seen on each continent by year. Now try to report the year, the continent, the minimum (or max or whatever) life expectancy AND the country it pertains to. You could do something similar with GDP per capita or population (though result likely to be boring with population). Hint: read up on which.min() and which.max().

## this works but produces a frustrating long/tall result; not good for printing
# ddply(gDat, ~ year + continent, function(x) {
#   theMin <- which.min(x$lifeExp)
#   x[theMin, c("country", "year", "continent", "lifeExp")]
#   })

foo <- daply(gDat, ~ year + continent, function(x) {
  theMin <- which.min(x$lifeExp)
  return(sprintf("%1.0f %s", x$lifeExp[theMin], x$country[theMin]))
})
htmlPrint(foo, include.rownames = TRUE)

	Africa	Americas	Asia	Europe	Oceania
1952	30 Gambia	38 Haiti	29 Afghanistan	44 Turkey	69 Australia
1957	32 Sierra Leone	41 Haiti	30 Afghanistan	48 Turkey	70 New Zealand
1962	33 Sierra Leone	43 Bolivia	32 Afghanistan	52 Turkey	71 Australia
1967	34 Sierra Leone	45 Bolivia	34 Afghanistan	54 Turkey	71 Australia
1972	35 Sierra Leone	47 Bolivia	36 Afghanistan	57 Turkey	72 New Zealand
1977	37 Sierra Leone	50 Haiti	31 Cambodia	60 Turkey	72 New Zealand
1982	38 Sierra Leone	51 Haiti	40 Afghanistan	61 Turkey	74 New Zealand
1987	40 Angola	54 Haiti	41 Afghanistan	63 Turkey	74 New Zealand
1992	24 Rwanda	55 Haiti	42 Afghanistan	66 Turkey	76 New Zealand
1997	36 Rwanda	57 Haiti	42 Afghanistan	69 Turkey	78 New Zealand
2002	39 Zambia	58 Haiti	42 Afghanistan	71 Turkey	79 New Zealand
2007	40 Swaziland	61 Haiti	44 Afghanistan	72 Turkey	80 New Zealand

Consider the linear regression we fit in tutorial of life expectancy vs. time. Find the min (and/or max) of the slope (and/or the intercept) within each continent. Report these interesting countries and some info about them in a data.frame. This might work for other variables too.

yearMin <- min(gDat$year)
jFun <- function(x) {
  estCoefs <- coef(lm(lifeExp ~ I(year - yearMin), x))
  names(estCoefs) <- c("intercept", "slope")
  return(estCoefs)
  }
jCoefs <- ddply(gDat, ~ country + continent, jFun)
## giving out the Most Improved Award!
## going after maximum slope within continent
foo <- ddply(jCoefs, ~ continent, function(x) {
  theMax <- which.max(x$slope)
  x[theMax, c("continent", "country", "intercept", "slope")]
  })
htmlPrint(foo, digits = c(0, 0, 0, 0, 2))

continent	country	intercept	slope
Africa	Libya	42	0.63
Americas	Nicaragua	43	0.56
Asia	Oman	37	0.77
Europe	Turkey	46	0.50
Oceania	Australia	68	0.23

Sudden, substantial departures from the temporal trend is interesting. This goes for life expectancy, GDP per capita, or population. How might one operationalize this notion of "interesting"?

Fit a regression of the response vs. time. Consider the residuals. Determine if there are 1 or more freakishly large residuals, in an absolute sense or relative to some estimate of background variability. If there are, consider that country "interesting".
Fit a regression using ordinary least squares and a robust technique. Determine the difference in estimated parameters under the two approaches. If it is large, consider that country "interesting".

Make a data.frame of interesting countries, reporting their continent affiliation, and summary statistics on life expectancy (or GDP/capita or whatever you were looking at). Won't it be interesting to start to graphically explore them ... next week.

yearMin <- min(gDat$year)
jFun <- function(x) {
  jFit <- lm(lifeExp ~ I(year - yearMin), x)
  jCoef <- coef(jFit)
  names(jCoef) <- NULL
  return(c(intercept = jCoef[1],
           slope = jCoef[2],
           maxResid = max(abs(resid(jFit)))/summary(jFit)$sigma))
  }
maxResids <- ddply(gDat, ~ country + continent, jFun)
foo <- ddply(maxResids, ~ continent, function(x) {
  theMax <- which.max(x$maxResid)
  x[theMax, ]
  })
htmlPrint(foo, digits = c(0, 0, 0, 2, 2, 2))

country	continent	intercept	slope	maxResid
Rwanda	Africa	42.74	-0.05	2.64
Ecuador	Americas	49.07	0.50	2.25
Cambodia	Asia	37.02	0.40	2.79
Finland	Europe	66.45	0.24	2.72
Australia	Oceania	68.40	0.23	1.65

## I can't resist, I must make a plot
xyplot(lifeExp ~ year | country, gDat,
       subset = country %in% foo$country, type = c("p", "r"))

Discuss: Is Ecuador really interesting? Do you think it's the most interesting country in the Americas? I doubt it. What could we do better? Making plots is ABSOLUTELY ESSENTIAL to sanity checking your claims and beliefs based on numerical computation.

Student work

attali-dea source | report
baik-jon source | report
bolandnazar-moh source | report
brueckman-chr EDIT HERE
chu-jus source | report
Zachary Daly: source report
dinsdale-dan source | report
gao-wen source | report
Matthew Gingerich: source | report
hu-yum source | report
jewell-sea source | report
johnston-reb source | report
mahdiar khosravi source | report
Wooyong Lee: source | report
liao-wei: source | report
ma-hui source | report
meng-viv source | report
mohd abul basher-abd source | report
ni-jac source | report
Christian Okkels source | report
Greg Owens: source | report
Mina Park: source | report
spencer-nei source | report
wang-ton source | report
Leah Weber: source | report
woollard-geo source | report
xue-xinxin source | report
yuen-ama source | report
zhang-yim source| report
zhang-jon source | report
zhang-yif source | report
zhang-jin source | report
inskip-jes source | report
liu-yan source | report
haraty-mon EDIT HERE
yuen-mac source | report

Q and A

Student: consider the exercises where we return info on the country that exhibits, e.g., the minimum life expectancy for a certain continent and/or year. What if there are ties and that minimum is achieved by more than 1 country?

I consider this a special case or extension of something we did above: identifying "low life expectancy" countries. So I will mix that code with our min/max finding code. BTW I cannot find an instance where any countries tie for lowest or longest life expectancy, but my two stage solution reflects my attempt to look into that. I created a copy of gDat and altered two life expectancies to create a tie! No, people are not usually living past 110 in Norway and Sweden!

## I couldn't find any examples of two or more countries sharing the min or max
## life expectancy in a given year; so I tweaked the data!

testDat <- gDat
toReplace <- with(testDat, country %in% c("Norway", "Sweden") & year == 2007)
testDat[toReplace, "lifeExp"] <- 110

## for each life expectancy, compute its rank within all life expectancies
## recorded that year
ggDat <- ddply(testDat, ~ year, transform,
               leRank = rank(lifeExp, ties.method = "average"))
## look at the new leRank variable, if you wish
#table(ggDat$leRank)

## pull out the countries with extreme ranks ... with min and max being the most
## extreme of the extremes!
foo <- ddply(ggDat, ~ year, function(x) {
  minRank <- min(x$leRank) # you could change this to find the lowest, e.g. 2
  maxRank <- max(x$leRank) # and the highest 2
  whoMin <- which(x$leRank == minRank) # I allow this to have length > 1
  whoMax <- which(x$leRank == maxRank)
  data.frame(year = x$year[whoMin[1]],
             minLifeExp = x$lifeExp[whoMin[1]],
             minCountries = paste(x$country[whoMin], collapse = " "),
             minRank = minRank,
             maxLifeExp = x$lifeExp[whoMax[1]],
             maxCountries = paste(x$country[whoMax], collapse = " "),
             maxRank =  maxRank)
  })
htmlPrint(foo, digits = c(0, 0, 0, 0, 1, 0, 0, 1))

year	minLifeExp	minCountries	minRank	maxLifeExp	maxCountries	maxRank
1952	29	Afghanistan	1.0	73	Norway	142.0
1957	30	Afghanistan	1.0	73	Iceland	142.0
1962	32	Afghanistan	1.0	74	Iceland	142.0
1967	34	Afghanistan	1.0	74	Sweden	142.0
1972	35	Sierra Leone	1.0	75	Sweden	142.0
1977	31	Cambodia	1.0	76	Iceland	142.0
1982	38	Sierra Leone	1.0	77	Japan	142.0
1987	40	Angola	1.0	79	Japan	142.0
1992	24	Rwanda	1.0	79	Japan	142.0
1997	36	Rwanda	1.0	81	Japan	142.0
2002	39	Zambia	1.0	82	Japan	142.0
2007	40	Swaziland	1.0	110	Norway Sweden	141.5

Remember, the 2007 data for Norway and Sweden is fake; useful for testing this code.