Midterm 1� Solutions
Bookmark this page
Midterm 1, Spring 2021: Music recommender
Version 1.0
This problem builds on your knowledge of basic Python
... [Show More] data structures and string processing. It has seven (7) exercises, numbered 0 to 6. There are elev(11) available points. However, to earn 100%, the threshold is just 10 points. (Therefore, once you hit 10 points, you can stop. There is no extra credit for
exceeding this threshold.)
Each exercise builds logically on the previous one, but you may solve them in any order. That is, if you can't solve an exercise, you can still move on and trthe next one. However, if you see a code cell introduced by the phrase, "Sample result(s) for ...", please run it. Some demo cells in the notebook madepend on these precomputed results.
The point values of individual exercises are as follows:
Exercise 0: 1 point
Exercise 1: 1 point
Exercise 2: 2 points
Exercise 3: 2 points
Exercise 4: 2 points
Exercise 5: 1 point
Exercise 6: 2 points
Pro-tips.
Many or all test cells use randomly generated inputs. Therefore, try your best to write solutions that do not assume too much. To help you debug,
when a test cell does fail, it will often tell you exactly what inputs it was using and what output it expected, compared to yours.
If you need a complex SQL query, remember that you can define one using a triple-quoted (multiline) string (https://docs.python.org/3.7/tutorial
/introduction.html#strings).
If your program behavior seem strange, try resetting the kernel and rerunning everything.
If you mess up this notebook or just want to start from scratch, save copies of all your partial responses and use Actions Reset Assignment
to get a fresh, original copy of this notebook. (Resetting will wipe out any answers you've written so far, so be sure to stash those somewhere safe if
you intend to keep or reuse them!)
If you generate excessive output that causes the notebook to load slowly or not at all (e.g., from an ill-placed print statement), use
Clear Notebook Output to get a clean copy. The clean copy will retain your code but remove any generated output. However, it will also
the notebook to clean.xxx.ipynb. Since the autograder expects a notebook file with the original name, you'll need to rename the clean notebook
accordingly. Be forewarned: we won't manually grade "cleaned" notebooks if you forget!
Good luck!
Background and overview: Spotify playlist data
Suppose you are running a musical service and would like to help your users discover artists based on artists they already like. In this problem, you'll protoa simple recommender by mining a dataset of user-generated playlists from Spotify, circa 2015.
Your overall workflow will be as follows:
1. Manually inspect the data and how it is stored
2. Gather some preliminary statistics to get a "feel" for the data
3. Clean the data a bit, namely by "normalizing" artist names
4. Use ideas from Notebook 2 to analyze artist co-occurrences in playlists
With that in mind, let's start!
Modules and data. Run the following two code cells, which load some modules this notebook needs as well as the data itself.
→
Previous Next
Midterm 1: Solutions | Midterm 1: Topics 1-5 | SP21: Computing for Da... https://learning.edx.org/course/course-v1:GTx+CSE6040x+1T2021/bloc...
2 of 17 4/29/2021, 2:41 PM
The data for this problem are several hundred megabytes in size and so may take a minute to load.
In [1]: ### BEGIN HIDDEN TESTS
%load_ext autoreload
%autoreload 2
### END HIDDEN TESTS
from pprint import pprint
from testing_tools import load_pickle
print("Ready!")
In [2]: !date
spotify_users = load_pickle('user_playlists.pickle')
print("==> Finished loading the data.")
!date
Familiarize yourself with these data
The variable spotify_users holds the data you'll need. It consists of a list of about 15,000 or so users:
In [3]: print(f"`spotify_users`: type == {type(spotify_users)}, number of elements == {len(spotify_users):,}."
Each element of this list corresponds to a distinct user. Have a look at the user at position 2526 of this list:
In [4]: pprint(spotify_users[2526])
Opening pickle from './resource/asnlib/publicdata/user_ids.pickle' ...
Opening pickle from './resource/asnlib/publicdata/artist_names.pickle' ...
Opening pickle from './resource/asnlib/publicdata/playlist_names.pickle' ...
Opening pickle from './resource/asnlib/publicdata/track_titles.pickle' ...
Opening pickle from './resource/asnlib/publicdata/artist_translation_table.pickle' ...
Ready!
Tue 09 Mar 2021 06:13:29 PM PST
Opening pickle from './resource/asnlib/publicdata/user_playlists.pickle' ...
==> Finished loading the data.
Tue 09 Mar 2021 06:13:40 PM PST [Show Less]