Georgia Institute of Technology>Final Quiz - Summer 2018 - Verified Learners>ISYE 6501.
EdX and its Members use cookies and other tracking technologies
... [Show More] for performance, analytics, and marketing × purposes. By using this website, you accept this use. Learn more about these technologies in the Privacy Policy. Course Final Q… Final Q… Final Q… Final Quiz - Summer 2018 - Verified Learners View the Proctoring System Requirements to ensure that your set-up will work. Note that proctoring is only supported on MacOS and Windows machines. We recommend 1GB of free space on your machine, and a functioning Webcam is required. Your space should be clean, no writing visible on walls or surfaces, and you should be alone in the room. More details about your test-taking environment will be provided within the GFA course materials. Please make sure that you have verified your ID before you attempt to take the Final Quiz. You should have been asked to verify your ID when you upgraded to Verified. The proctored exams are offered by a 3rd party vendor called Software Secure. If you get stuck during the proctoring process you are welcome to contact Software Secure's chat which is available 24/7 or call them at 1-844-224-9759 180 Minute Time Limit Instructions Work alone. Do not collaborate with or copy from anyone else. You may use any of the following resources: Two sheets (both sides) of handwritten (not photocopied or scanned) notes Blank scratch paper and pen/pencil If any question seems ambiguous, use the most reasonable interpretation (i.e. don't be like Calvin): Good Luck! Question 1 3.5/4 points (graded) Keyboard Help Drag each of the solution methods to the type of problem it's designed for. Each model or method should be placed somewhere, so you may need to put more than one in the same answer box. There may be more than one correct answer for some models/methods. k-means Fractional factorial design You have used 1 of 1 attempts. Reset Show Answer FEEDBACK Correctly placed 7 items. Misplaced 1 item. Good work! You have completed this drag and drop problem. Final attempt was used, highest score is 3.5 Question 2 5/5 points (graded) Keyboard Help Drag each of the models to the type of data and type of question it's best suited for. Each model or method should be placed somewhere, so you may need to put more than one in the same answer box. There may be more than one correct answer for some models/methods. You have used 1 of 1 attempts. Reset Show Answer FEEDBACK Correctly placed 8 items. Good work! You have completed this drag and drop problem. Final attempt was used, highest score is 5.0 Question 3 2.0/4.0 points (graded) Select all of the following that are examples of time-series data You have used 1 of 1 attempt Question 4 1.25/5.0 points (graded) Select all of the following reasons that data should not be scaled until point outliers are removed. You have used 1 of 1 attempt Question 5 4.0/4.0 points (graded) Select all of the following situations in which using a variable selection approach like lasso or stepwise regression would be important. You have used 1 of 1 attempt Question 6 4/4 points (graded) Keyboard Help Drag each of these software packages to the model it's best suited for analyzing. Each model or method should be placed somewhere, so you may need to put more than one in the same answer box. You have used 1 of 1 attempts. Reset Show Answer FEEDBACK Correctly placed 4 items. Good work! You have completed this drag and drop problem. Final attempt was used, highest score is 4.0 Question 7 7/7 points (graded) Keyboard Help For each of the analytics tasks listed below, drag to it the R function(s) that do it. Not all functions might be used. You have used 1 of 1 attempts. Reset Show Answer FEEDBACK Correctly placed 13 items. Good work! You have completed this drag and drop problem. Final attempt was used, highest score is 7.0 Question 8 2.68/4.0 points (graded) The following process was followed to predict sales of a product each month for the next three years: 1. Split past sales data randomly into three sets: training, validation, and test. 2. Build 20 different models using training data. 3. Test all 20 models on validation data. 4. Select the model that performed best on the validation data. 5. Test the selected model on a set of test data. 6. Use the selected model to predict monthly sales for the next three years based on real-time data, and observe its true performance. Select all of the following that are true. You have used 1 of 1 attempt Question 9 1.0/4.0 points (graded) A positive correlation has been observed between hours of sleep and self-reported happiness. Based on that observed correlation, select all of the following statements about the direction of causality between sleep and happiness is true? You have used 1 of 1 attempt Question 10 3.0/4.0 points (graded) Select all of the following situations where imputing missing data for a variable is probably better than than including a "data missing" binary variable? You have used 1 of 1 attempt Question 11 4.0/4.0 points (graded) Select all of the following situations where a linear regression model is more directly appropriate than a logistic regression model when analyzing a specific World Cup soccer match. You have used 1 of 1 attempt Question 12 3.0/3.0 points (graded) Select all of the following situations where a supervised learning model (like classification) is more directly appropriate than an unsupervised learning model (like clustering). You have used 1 of 1 attempt Question 13 4.0/4.0 points (graded) A hospital has collected data on how long hip replacement surgery patients have required before regaining nearly-full motion without pain, as well as attributes of each patient (age, height, weight, pre-surgery range of motion, other medical conditions, etc.). Now, the hospital wants to use that data to predict recovery time for a new patient. Select all of the following situations where a linear regression model is more directly appropriate than a classification model. You have used 1 of 1 attempt Question 14 4.0/4.0 points (graded) Select all of the following situations where a simulation model is more directly appropriate than an optimization model. You have used 1 of 1 attempt Questions 15a-f 15.6/18.0 points (graded) A large trial law firm would like to increase the fraction of cases it wins by doing a better job of assigning cases to its lawyers. If you're an expert in the legal industry, please do not rely on your expertise to fill in all that extra complexity (you'll end up making the questions more complex than I intended). Currently, cases are assigned based on workload. When a new case comes in, the firm assigns it to whichever lawyer has the lowest current workload among the subset of lawyers qualified to handle the case. The current approach sometimes leads to the law firm losing a case it could've won, because lawyers are sometimes assigned to cases that they're minimally qualified for but aren't in their primary area of expertise. This happens because the minimally-qualified lawyer has the lowest current workload, while a more-qualified lawyer who is more likely to win the case has a slightly higher workload (but would still be able to take on this case). Instead, the law firm's managing director would like to start using analytics to determine which lawyer to assign to each new case. a. Select all of the models/approaches the practice could use to predict the probability that a certain lawyer will win a specific case, based on characteristics of this case, characteristics of the lawyer's previous cases, and whether the lawyer won or lost each previous case. Suppose the law firm begins assigning cases to whichever lawyer has the highest probability of winning, from among all lawyers whose case loads are not full. Once the law firm starts using probability-based case assignment, they begin winning a higher fraction of cases. As a result, they believe they will start getting more business (more cases coming in). b. Select all of the models/approaches the practice could use to determine whether or not there has been a change in the rate of cases coming in. c. Select all of the models/approaches the practice could use to predict how many new cases will come to the firm each month, based on the number of cases that have come in during past months, in a way that can adapt over time as the situation changes. Over time, lawyers' probabilities of winning cases could improve, as they get more experience. Suppose that two years after implementing probability-based case assignment, the law firm wants to determine whether the probabilities they estimated two years earlier (before probability-based assignment) are different two years later (after implementing probability-based assignment). d. Select all of the models/approaches the practice could use to determine whether there has been a big- enough change in probabilities that they should re-fit the model in part a. on more-recent data. Another approach to part d. would be to use the binomial distribution: treat cases before implementation as one distribution and cases after implementation as another, and see whether the observed win probability p is significantly different. The firm could use these significance tests to determine which lawyers were improving their skills and which were not. e. Select all of the reasons that this would not be a good approach to use. Suppose a new lawyer joins the firm right after graduating from law school, so there is no data on this lawyer's probability of winning different types of cases. f. Select all of the models/approaches the practice could use to estimate win probabilities for this new lawyer until enough data is collected. You have used 1 of 1 attempt Information for Question 16 Question 16 2.0/4.0 points (graded) A support vector machine model has been created to predict whether a person is right-handed or left-handed, based on the person's genetic profile. The figure above shows a confusion matrix of the model's performance on a test data set that it was not trained on. Select all of the following statements that are true. You have used 1 of 1 attempt Questions 17a-c 4.26/6.0 points (graded) A very large (thousands of rooms) hotel in Las Vegas is planning to remodel its parking garage. There are lots of considerations and complexities that go into doing that, so this question will look at just a small part of it, with several simplifications. If you're an expert in the construction or hotel industries, please do not rely on your expertise to fill in all that extra complexity (you'll end up making the questions more complicated than I intended). When remodeling the parking garage, the hotel wants to make sure that it is unlikely to run out of parking spaces even when all rooms are occupied, but given that restriction it also wants to make the parking garage as small as possible to save costs and space. The hotel would like to use analytics (analyzing its past ten years of data) to help determine the right number of parking spaces to have. A complicating factor is that the hotel doesn't have complete data; for about 2% of the hotel guests, the person at the front desk did not record whether the guest had a car or not. a. The hotel's director of facilities has come up with the following incorrect idea: GIVEN past attribute data of guests and whether they had a car, USE linear regression TO impute the missing data (whether guests had a car or not). Then, GIVEN the number of cars each day for the past ten years, USE exponential smoothing TO predict how many parking spaces will be needed each day for the next ten years. Finally, GIVEN the daily predictions of the number of parking spaces required, USE optimization TO determine the minimum number of parking spaces required so that 90% of the highest 10% of daily predictions will be less than or equal to the number of parking spaces. Select all of the statements below that show a reason why the director's idea is wrong. b. The director has come up with another incorrect idea: GIVEN past attribute data of guests and whether they had a car, USE a support vector machine (SVM) model TO impute the missing data (whether guests had a car or not). Then, GIVEN the average and standard deviation of the number of cars on each day, USE a normal distribution TO determine how many parking spaces are needed to be 99% sure that there will be enough spaces for the average number of cars plus normally-distributed randomness. Select all of the statements below that show a reason why the director's idea is wrong. c. Select all of the possible paths below that could lead to a good solution. You have used 1 of 1 attempt Question 17d 0.0/4.0 points (graded) d. Select a set of models from the list below, that the director can put together to determine how many parking spaces there should be. You have used 0 of 1 attempt Question 17e 0.0/2.0 points (graded) e. Select all of the following complexities that are not accounted for in any of the models in part d. You have used 0 of 1 attempt Questions 18a-d 1.875/6.0 points (graded) In the United States in 2016, the median annual earnings among working women were about 20% lower than the median annual earnings among working men. a. What nonparametric test could most-directly be used to show that the difference between the median incomes of working men and working women is statistically significant? In the United States in 2016 the fraction of working men who died on the job was about 11 times higher than the fraction of working women who died on the job. b. Select all of the appropriate uses of the binomial distribution to test whether the difference in death rates between working men and working women is statistically significant. Let Nm be the number of working men, Nw be the number of working women, and Km and Kw be the number of men and women killed on the job. NOTE: In the answer choices below, a "yes" answer refers to what is usually called a "success" in the binomial distribution, but obviously a person being killed is not a "success" so I've used a different term. One suggested explanation for both discrepancies is the difference between the jobs chosen (and for college students, the college majors chosen) by men and women. Other factors (age, marital status, child-rearing responsibilities, etc.) have also been suggested. To test whether discrepancies between men's and women's outcomes persist after all of these factors are accounted for, a researcher has suggested the following: for each job classification, find 100 pairs of one man and one woman who have the same age, marital status, number of children, county of residence, college major and college selectivity (if applicable), and then run a non-parametric statistical test to see if women's and men's income (and/or death rate) are still significantly different. c. Which of the nonparametric tests that would be valid to use in such a study. [Do not spend time worrying about whether the study's setup provides a valid comparison For the sake of this question assume it does and think about what nonparametric test could be used in that case.] Finding pairs of people like the ones suggested in part c. could be very difficult. Instead, the researcher might want to use a different type of model. d. Select all of the approaches below that might help determine how much of the differences remain after accounting for all of the factors listed above in part c. Data for this question was taken from the United States Bureau of Labor Statistics (BLS). These sorts of issues are important; if you have any thoughts about how we could use analytics to address them, please let me know! You have used 1 of 1 attempt Question 19 0 points possible (ungraded) Do you think that you or any of your fellow students in this course would be good TAs for the course in the future? If so, please enter name(s) or username(s) below. © All Rights Reserved [Show Less]