Lab 8: Correlation, Variance of Sample Means
Welcome to Lab 8!
In today's lab, we will cover two relatively orthogonal concepts. First, we will
... [Show More] investigate the variance of sample
means, found in Section 14.5 (https://www.inferentialthinking.com/chapters/14/5/variability-of-the-samplemean.html) of our textbook. We will also get some hands-on practice with understanding the association
between two variables, which you can read more about in Section 15.1
(https://www.inferentialthinking.com/chapters/15/1/correlation.html).
In [ ]: # Run this cell, but please don't change it.
# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *
# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
# These lines load the tests.
from client.api.notebook import Notebook
ok = Notebook('lab08.ok')
_ = ok.auth(inline=True)
1. How Faithful is Old Faithful?
(Note: clever title comes from here (http://web.pdx.edu/~jfreder/M212/oldfaithful.pdf).)
Old Faithful is a geyser in Yellowstone National Park in the central United States. It's famous for erupting on a
fairly regular schedule. You can see a video below.
In [ ]: # For the curious: this is how to display a YouTube video in a
# Jupyter notebook. The argument to YouTubeVideo is the part
# of the URL (called a "query parameter") that identifies the
# video. For example, the full URL for this video is:
# https://www.youtube.com/watch?v=wE8NDuzt8eg
from IPython.display import YouTubeVideo
YouTubeVideo("wE8NDuzt8eg")
7/21/2018 lab08_master
file:///Users/rohannarain/Downloads/lab08_master.html 2/10
Some of Old Faithful's eruptions last longer than others. When it has a long eruption, there's generally a longer
wait until the next eruption.
If you visit Yellowstone, you might want to predict when the next eruption will happen, so you can see the rest of
the park and come to see the geyser when it happens. Today, we will use a dataset on eruption durations and
waiting times to see if we can make such predictions accurately with linear regression.
The dataset has one row for each observed eruption. It includes the following columns:
duration: Eruption duration, in minutes
wait: Time between this eruption and the next, also in minutes
Run the next cell to load the dataset [Show Less]