Go back to STAT545A home

Overview of the STAT545A lecture slides

Lecture slides from STAT545A Exploratory Data Analysis, as delivered Sept - Dec 2012.

Go here for the current 2013 run of the course. Messier but more current. A cleaner index is also taking shape.

Examples referred to in the lectures, neatly packaged with code, data, and figures, can be found here:


cm01.pdf | Course introduction. Course goals, requirements, marks, good (e-)books, instructor bio and contact info, student depts & degrees.

cm02.pdf | Response to Assignment 1: R Gapminder Challenge Shock Therapy. View some student code and figures, need-to-know-now hints such as data import with read.table or read.delim, managing file and pathnames, writing to PDF file, subsetting the data.frame, strategies for tackling a difficult task, R-aware text editors.

cm03.pdf | Data checking and cleaning, esp.of categorical variables in Gapminder data. Simple view of simple R objects. Factors are special. Simple view of data.frames, lists, and arrays. Use names, data.frames, with() and subset().

cm04.pdf | Best practices for indexing or subsetting various R objects. Data aggregation (i.e. all the apply-type functions).

cm05.pdf | Exploring a quantitative variable, optionally with 1 or 2 associated categorical variables, e.g. life expectancy from Gapminder. stripplot, jittering, reorder factor levels, dropping factor levels, adding extra info like median, boxplot, violinplot, histogram, densityplot. Kernel density estimation. Preferred R assignment operator. R's formula syntax.

cm06.pdf | A Gapminder "solution" using base R graphics. Colors in R. RColorbrewer. Creating color schemes and mapping a factor into colors. HCL color model and the colorspace package.

cm07 | Pep rally for the final project. No lecture slides.

cm08.pdf | A Gapminder "solution" using lattice. High volume scatterplots.

cm09.pdf | Source is real. R coding style. How to organize an analytical project. Options for sharing an analytical project or for collaborative development. Version control. Sweave, knitr, git, SVN, mercurial, github, *forge. Managing an R installation. Getting stuff out of R, safely tucked away in a file for later use or incorporated as a table in some other environment.

cm10.pdf | How to ask a question to elicit a timely, useful answer. Two group tests. "Scaling up": different tests, many two group comparisons. Using figures to convey bulk statistical results instead of big tables of numbers. Heatmap via level plot(). Transforming a quantitative variable, e.g. probit transformation of p-value, while mapping it to colours. Rational ordering, e.g. ordering by a summary statistic or via hierarchical clustering.

cm11.pdf | Using a bootstrap approach to two group testing using the yeast growth data. Demo of robust regression using life expectancy over time for Rwanda. Data reshaping.

cm12.pdf | Smoothing (kernel smoothing and local polynomial regression, e.g. loess). Using cross-validation to select a tuning parameter that controls how much smoothing to do. Degenerate shingle trick to put same data into each panel or packet, with the goal of overlaying different fits.