Getting Started: You should complete the assignment using your own installation of Python 3.6. Download the assignment archive from Moodle and unzip the
... [Show More] file. This will create the directory structure as shown below. You will write your code under the Submission/Code directory. Make sure to put the deliverables (explained below) into the respective directories. HW01 --- Data |-- Credit Card Transaction --- Submission |--Code |--Figures |--Predictions If you are stuck on a question consider attending the office hours of the TA listed for that question. Data Sets: It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this assignment, you will experiment with different classifiers on the binary classification problem of anomaly detection in credit card transactions. The dataset described below contains transactions that occurred in a two day period. Due to confidentiality issues, the background information about the features will not be described. You only know that the first attribute describes the dollar amount in the transaction and the class output is either 0 for normal or 1 for fraud. Dataset Training Cases Test Cases Dimensionality Number of Classes Credit Card Transaction 200000 50000 29 2 Deliverables: This assignment has three types of deliverables: a report, code files, and Kaggle submissions. • Report: The solution report will give your answers to the homework questions (listed below). The maximum length of the report is 5 pages in 11 point font, including all figures and tables. You can use any software to create your report, but your report must be submitted in PDF format. • Code: The second deliverable is the code that you wrote to answer the questions, which will involve training classifiers and making predictions on held-out test data. Your code must be Python 3.6 (no iPython notebooks, other formats or code from other versions). You may create any additional source files to perform data analysis. However, you should aim to write your code so that it is possible to re-produce all of your experimental results exactly by running python run me.py file from the Submissions/Code directory. Remember to comment your code. Points will be deducted from your assignment grade if your code is difficult to reproduce! • Kaggle Submissions: We will use Kaggle, a machine learning competition service, to evaluate the classifiers you create. You will need to register on Kaggle using a umass.edu email address to submit to Kaggle, but you can choose any user name you like. You will generate test prediction files, save them 1 in Kaggle format (see example.csv file in the HW01/Submission/Predictions/ directory to generate Kaggle compliant prediction files), and upload them to Kaggle for scoring. For your convenience we have provided a kaggle.py script that takes your predictions and converts them into Kaggle format. This script is located in the Submission/Code directory. Your scores will be shown on the Kaggle leaderboard, and 10% of your assignment grade will be based on how well you do in these competitions. You have a limit of 3 submissions per day. The Kaggle link is given below [Show Less]