STATS 32: Introduction to R for Undergraduates

Autumn 2019/2020

Course Description

This short course runs for weeks one through five of the quarter. It is recommended for undergraduate students who want to use R in the humanities or social sciences and for students who want to learn the basics of R programming. The goal of the short course is to familiarize students with R's tools for data analysis. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. No prior programming experience is needed. Topics covered include basic data structures, File I/O, data transformation and visualization, simple statistical tests, etc, and some useful packages in R. Prerequisite: undergraduate student. Priority given to non-engineering students. Laptops necessary for use in class.

For the course syllabus, click here.


Classes

TTh 12:00 to 1:20pm, 200-203 (Weeks 1-5 only)

Instructor

Kenneth Tay, Office Hours Friday 10am-12pm, Sequoia Hall Rm 105 (Weeks 1-6 only)

Note: For Week 1 ONLY, office hours will be Friday 1030am-1230pm (same location).


Assignments

Final project & proposal (graded)

The only graded assignments for this class are the final project (80%) and the project proposal (20%). Click here for more details on these assignments.

Programming questions (not graded)

Programming is one of those things that you can't learn just by listening to lectures: you have to practice (and practice and practice)! However, since this is a 1-credit class, I don't want to have graded assignments on top of the project. To that end, after each session I will release a few questions to test your understanding of that session's material, and will release the answers a few days later. Responses to these questions will NOT be graded.


Piazza forum

The Piazza forum is a place for you to ask (and answer) questions about the course. This includes both questions about content and about course logistics. I won't be checking on this super often, so it is a better place for posting questions which your classmates might know the answer to. Assignment extensions will NOT be entertained on Piazza.

To access the forum, click on this link. You can get the access code from the first announcement on Canvas.


Course materials

There is no textbook for this class. Having said that, much of the material for this class was heavily inspired by "R for Data Science" by Garrett Grolemund and Hadley Wickham which is available online for free here. It is very comprehensive and well-written, and I recommend it highly to anyone who wants to do data science in R!

Course materials will be added progressively to the table below. To save material, right-click and click on "Save Link As..."

Session No. Before class In-class material After class (optional)
Session 1 (24 Sep)

Introduction to R
Required reading: Optional reading: Install R and RStudio on your laptop (Mac / Windows).

Install relevant R packages (instructions here).

Slides

Lab
Programming questions

Programming solutions

Further R4DS reading: Ch 4, 6
Session 2 (26 Sep)

Basic R objects
Required reading: Slides

Lab
Programming questions

Programming solutions

Further R4DS reading: Ch 20
Session 3 (1 Oct)

Data visualization with ggplot2
Optional reading: Slides

Lab
Programming questions

Programming solutions

Further R4DS reading: Ch 3
Session 4 (3 Oct)

Data visualization with ggplot2 (continued)
Slides

Lab
Programming questions

Programming solutions

Further R4DS reading: Ch 28
Session 5 (8 Oct)

Data transformation with dplyr
Optional reading: Slides

Lab
Programming questions

Programming solutions

Further R4DS reading: Ch 5, 18
Session 6 (10 Oct)

Functions and more data transformation
Slides

Lab
Programming questions

Programming solutions

Further R4DS reading: Ch 6, 12
Session 7 (15 Oct)

Importing your own data and factors
Slides

Lab

NBA data
Programming questions

Programming solutions

Further R4DS reading: Ch 8, 11, 15
Project proposal DUE! 16 Oct 23:59:59
Session 8 (17 Oct)

Publishing in R Markdown
Required reading: Optional reading: Slides

Lab

Airbnb data

Airbnb_analysis.R

Airbnb analysis (.Rmd file)

Airbnb analysis (.html file)
Programming questions

Programming solutions (.Rmd file)

Programming solutions (.html file)

Further R4DS reading: Ch 27, 30
Session 9 (22 Oct)

Data joining and maps
Optional reading: Slides

Lab

Elections data

County map data

Elections analysis (.Rmd file)

Elections analysis (.html file)
Programming questions

Programming solutions

Further R4DS reading: Ch 13
Session 10 (24 Oct)

Statistical testing and linear regression
Optional reading: Slides

Lab

Spotify data

Spotify starter R script

Spotify analysis (.Rmd file)

Spotify analysis (.html file)
Programming questions

Programming solutions

Further R4DS reading: Ch 23
Project DUE! 2 Nov 23:59:59