lab11
April 9, 2020
[1]: # Initialize OK
from client.api.notebook import Notebook
ok =
... [Show More] Notebook('lab11.ok')
=====================================================================
Assignment: Quantifying Sampling Errors in Regression
OK, version v1.12.5
=====================================================================
1 Lab 11: Regression Inference
Welcome to Lab 11!
Today we will get some hands-on practice with regression inference. You can find more information
about this topic in section 16.
[3]: # Run this cell to set up the notebook, but please don't change it.
# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *
# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
# These lines load the tests.
from client.api.notebook import Notebook
ok = Notebook('lab11.ok')
_ = ok.submit()
=====================================================================
Assignment: Quantifying Sampling Errors in Regression
OK, version v1.12.5
1
=====================================================================
Saving notebook… Saved 'lab11.ipynb'.
Submit… 100% complete
Submission successful for user: [email protected]
URL: https://okpy.org/cal/data8/fa19/lab11/submissions/28w0JP
Previously in this class, we’ve used confidence intervals to quantify uncertainty about estimates.
We can also run hypothesis tests using a confidence interval under the following procedure:
1. Define a null and alternative hypothesis (they must be of the form ”The parameter is X” and
”The parameter is not X”).
2. Choose a p-value cutoff, and call it q.
3. Construct a (100-q)% interval using bootstrap sampling (for example, if your p-value cutoff
q is .01, or 1%, then construct a 99% confidence interval).
4. Using the confidence interval, determine if your data are more consistent with your null or
alternative hypothesis:
• If the null hypothesis mean X is in your confidence interval, the data are more consistent with
the null hypothesis.
• If the null hypothesis mean X is not in your confidence interval, the data are more consistent
with the alternative hypothesis.
More recently we’ve discussed the use of linear regression to make predictions based on correlated
variables. For example, we can predict the height of children based on the heights of their parents.
We can combine these two topics to make powerful statements about our population by using the
following techniques: - Bootstrapped interval for the true slope - Bootstrapped prediction interval
for y (given a particular value of x)
This lab explores these two advanced methods.
Recall the Old Faithful dataset from our correlation lab (Lab 10). The table contains two pieces of
information for each eruption of the Old Faithful geyser in Yellowstone National Park: 1. duration:
the duration of the eruption, in minutes. 2. wait: the time between this eruption and the next
eruption (the ”waiting time”), in minutes.
For the purposes of this lab, we’ll only look at eruptions that have a duration that is greater than
or equal to three minutes.
[4]: faithful = Table.read_table('faithful_inference.csv').where("duration", are.
,→above_or_equal_to(3))
faithful [Show Less]