Project 3 - Classification
Welcome to the third project of Data 8! You will build a classifier that guesses whether a movie is romance or
action, using
... [Show More] only the numbers of times words appear in the movies's screenplay. By the end of the project,
you should know how to:
1. Build a k-nearest-neighbors classifier.
2. Test a classifier on data.
Logistics
Deadline. This project is due at 11:59pm on Friday 11/30. You can earn an early submission bonus point by
submitting your completed project by Thursday 11/29. It's much better to be early than late, so start
working now.
Checkpoint. For full credit, you must also complete Part 1 of the project (out of 4) and submit it by
11:59pm on Friday 11/16. You will have some lab time to work on these questions, but we recommend that
you start the project before lab and leave time to finish the checkpoint afterward.
Partners. You may work with one other partner; this partner must be enrolled in the same lab section as you
are. Only one of you is required to submit the project. On okpy.org (http://okpy.org), the person who submits
should also designate their partner so that both of you receive credit.
Rules. Don't share your code with anybody but your partner. You are welcome to discuss questions with
other students, but don't share the answers. The experience of solving the problems in this project will
prepare you for exams (and life). If someone asks you for the answer, resist! Instead, you can demonstrate
how you would solve a similar problem.
Support. You are not alone! Come to office hours, post on Piazza, and talk to your classmates. If you want
to ask about the details of your solution to a problem, make a private Piazza post and the staff will respond.
If you're ever feeling overwhelmed or don't know how to make progress, email your TA or tutor for help. You
can find contact information for the staff on the course website (http://data8.org/fa18/staff.html).
Tests. Passing the tests for a question does not mean that you answered the question correctly. Tests
usually only check that your table has the correct column labels. However, more tests will be applied to
verify the correctness of your submission in order to assign your final score, so be careful and check your
work!
Advice. Develop your answers incrementally. To perform a complicated table manipulation, break it up into
steps, perform each step on a different line, give a new name to each result, and check that each
intermediate result is what you expect. You can add any additional names or functions you want to the
project3 11/11/18, 6'27 PM
file:///Users/nrao/Downloads/project3.html Page 2 of 41
provided cells. Also, please be sure to not re-assign variables throughout the notebook! For example, if you
use max_temperature in your answer to one question, do not reassign it later on.
To get started, load datascience , numpy , plots , and ok [Show Less]