Answers

University of California, Berkeley DATA MISC sigma.academic

lab11 April 9, 2020 [1]: # Initialize OK from client.api.notebook import Notebook ok =... [Show More] Notebook('lab11.ok') ===================================================================== Assignment: Quantifying Sampling Errors in Regression OK, version v1.12.5 ===================================================================== 1 Lab 11: Regression Inference Welcome to Lab 11! Today we will get some hands-on practice with regression inference. You can find more information about this topic in section 16. [3]: # Run this cell to set up the notebook, but please don't change it. # These lines import the Numpy and Datascience modules. import numpy as np from datascience import * # These lines do some fancy plotting magic. import matplotlib %matplotlib inline import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') # These lines load the tests. from client.api.notebook import Notebook ok = Notebook('lab11.ok') _ = ok.submit() ===================================================================== Assignment: Quantifying Sampling Errors in Regression OK, version v1.12.5 1 ===================================================================== Saving notebook… Saved 'lab11.ipynb'. Submit… 100% complete Submission successful for user: [email protected] URL: https://okpy.org/cal/data8/fa19/lab11/submissions/28w0JP Previously in this class, we’ve used confidence intervals to quantify uncertainty about estimates. We can also run hypothesis tests using a confidence interval under the following procedure: 1. Define a null and alternative hypothesis (they must be of the form ”The parameter is X” and ”The parameter is not X”). 2. Choose a p-value cutoff, and call it q. 3. Construct a (100-q)% interval using bootstrap sampling (for example, if your p-value cutoff q is .01, or 1%, then construct a 99% confidence interval). 4. Using the confidence interval, determine if your data are more consistent with your null or alternative hypothesis: • If the null hypothesis mean X is in your confidence interval, the data are more consistent with the null hypothesis. • If the null hypothesis mean X is not in your confidence interval, the data are more consistent with the alternative hypothesis. More recently we’ve discussed the use of linear regression to make predictions based on correlated variables. For example, we can predict the height of children based on the heights of their parents. We can combine these two topics to make powerful statements about our population by using the following techniques: - Bootstrapped interval for the true slope - Bootstrapped prediction interval for y (given a particular value of x) This lab explores these two advanced methods. Recall the Old Faithful dataset from our correlation lab (Lab 10). The table contains two pieces of information for each eruption of the Old Faithful geyser in Yellowstone National Park: 1. duration: the duration of the eruption, in minutes. 2. wait: the time between this eruption and the next eruption (the ”waiting time”), in minutes. For the purposes of this lab, we’ll only look at eruptions that have a duration that is greater than or equal to three minutes. [4]: faithful = Table.read_table('faithful_inference.csv').where("duration", are. ,→above_or_equal_to(3)) faithful [Show Less]

Preview 2 out of 15 pages