Answers

CS ISYE6501XMidterm 1 Solutions Midterm 1 Topics 1-5 SP21 Computing for Data Analysis edX

Midterm 1� Solutions Bookmark this page Midterm 1, Spring 2021: Music recommender Version 1.0 This problem builds on your knowledge of basic Python... [Show More] data structures and string processing. It has seven (7) exercises, numbered 0 to 6. There are elev(11) available points. However, to earn 100%, the threshold is just 10 points. (Therefore, once you hit 10 points, you can stop. There is no extra credit for exceeding this threshold.) Each exercise builds logically on the previous one, but you may solve them in any order. That is, if you can't solve an exercise, you can still move on and trthe next one. However, if you see a code cell introduced by the phrase, "Sample result(s) for ...", please run it. Some demo cells in the notebook madepend on these precomputed results. The point values of individual exercises are as follows: Exercise 0: 1 point Exercise 1: 1 point Exercise 2: 2 points Exercise 3: 2 points Exercise 4: 2 points Exercise 5: 1 point Exercise 6: 2 points Pro-tips. Many or all test cells use randomly generated inputs. Therefore, try your best to write solutions that do not assume too much. To help you debug, when a test cell does fail, it will often tell you exactly what inputs it was using and what output it expected, compared to yours. If you need a complex SQL query, remember that you can define one using a triple-quoted (multiline) string (https://docs.python.org/3.7/tutorial /introduction.html#strings). If your program behavior seem strange, try resetting the kernel and rerunning everything. If you mess up this notebook or just want to start from scratch, save copies of all your partial responses and use Actions Reset Assignment to get a fresh, original copy of this notebook. (Resetting will wipe out any answers you've written so far, so be sure to stash those somewhere safe if you intend to keep or reuse them!) If you generate excessive output that causes the notebook to load slowly or not at all (e.g., from an ill-placed print statement), use Clear Notebook Output to get a clean copy. The clean copy will retain your code but remove any generated output. However, it will also the notebook to clean.xxx.ipynb. Since the autograder expects a notebook file with the original name, you'll need to rename the clean notebook accordingly. Be forewarned: we won't manually grade "cleaned" notebooks if you forget! Good luck! Background and overview: Spotify playlist data Suppose you are running a musical service and would like to help your users discover artists based on artists they already like. In this problem, you'll protoa simple recommender by mining a dataset of user-generated playlists from Spotify, circa 2015. Your overall workflow will be as follows: 1. Manually inspect the data and how it is stored 2. Gather some preliminary statistics to get a "feel" for the data 3. Clean the data a bit, namely by "normalizing" artist names 4. Use ideas from Notebook 2 to analyze artist co-occurrences in playlists With that in mind, let's start! Modules and data. Run the following two code cells, which load some modules this notebook needs as well as the data itself. → Previous Next Midterm 1: Solutions | Midterm 1: Topics 1-5 | SP21: Computing for Da... https://learning.edx.org/course/course-v1:GTx+CSE6040x+1T2021/bloc... 2 of 17 4/29/2021, 2:41 PM The data for this problem are several hundred megabytes in size and so may take a minute to load. In [1]: ### BEGIN HIDDEN TESTS %load_ext autoreload %autoreload 2 ### END HIDDEN TESTS from pprint import pprint from testing_tools import load_pickle print("Ready!") In [2]: !date spotify_users = load_pickle('user_playlists.pickle') print("==> Finished loading the data.") !date Familiarize yourself with these data The variable spotify_users holds the data you'll need. It consists of a list of about 15,000 or so users: In [3]: print(f"`spotify_users`: type == {type(spotify_users)}, number of elements == {len(spotify_users):,}." Each element of this list corresponds to a distinct user. Have a look at the user at position 2526 of this list: In [4]: pprint(spotify_users[2526]) Opening pickle from './resource/asnlib/publicdata/user_ids.pickle' ... Opening pickle from './resource/asnlib/publicdata/artist_names.pickle' ... Opening pickle from './resource/asnlib/publicdata/playlist_names.pickle' ... Opening pickle from './resource/asnlib/publicdata/track_titles.pickle' ... Opening pickle from './resource/asnlib/publicdata/artist_translation_table.pickle' ... Ready! Tue 09 Mar 2021 06:13:29 PM PST Opening pickle from './resource/asnlib/publicdata/user_playlists.pickle' ... ==> Finished loading the data. Tue 09 Mar 2021 06:13:40 PM PST [Show Less]

Preview 2 out of 17 pages