Answers

University of California, Berkeley COMPSCI 188CS_188_Spring_2020_Written_Homework_2-4-3 (4)

CS 188 Spring 2020 Introduction to Artificial Intelligence Written HW 2 Due: Wednesday 03/11/2020 at 11:59pm (submit via Gradescope). Policy: Can be... [Show More] solved in groups (acknowledge collaborators) but must be written up individually Submission: Your submission should be a PDF that matches this template. Each page of the PDF should align with the corresponding page of the template (page 1 has name/collaborators, question 1 begins on page 2, etc.). Do not reorder, split, combine, or add extra pages. The intention is that you print out the template, write on the page in pen/pencil, and then scan or take pictures of the pages to make your submission. You may also fill out this template digitally (e.g. using a tablet.) First name Last name SID Collaborators For sta↵ use only: Q1. MDP: Eating Chocolate /40 Q2. MDPs and RL /30 Q3. Probability: Flowers /30 Total /100 1 Devesh Agarwal 3033055014 qurmeharKaur DhruvKrishnaswamy ayasuMonta ArikBarrayoungLiHong Jasoncan Q1. [40 pts] MDP: Eating Chocolate We have a chocolate bar of dimensions 1 ⇥ 8, which contains 8 squares. Most of these squares are delicious chocolate squares, but some of them are poison! Although the chocolate and poison squares are visually indistinguishable, someone has told us which ones are which. The layout for our chocolate bar is shown below with P indicating poison squares. P P P Eating a chocolate square immediately gives a reward of 1 and eating a poison square immediately gives a reward of 2. Starting from the right end of the bar, there are 3 possible actions that we can take (at each step), and all of these actions cause non-deterministic transitions as follows: • Action b1: Try to bite 1 square, which will result in you actually eating 0, 1, or 2 squares with equal probability. • Action b2: Try to bite 2 squares, which will result in you actually eating 1, 2, or 3 squares with equal probability. • Action Stop: Stop biting, which will end the game definitively and result in no reward. (a) [10 pts] Formulate this problem as an MDP. States: Decide (and explain) how you want to represent a “state.” Using that representation, list out all of your possible states in this MDP. Actions: Transitions: Rewards: Skip the explicit formulation of the rewards for this problem. 2 numberofsquares remaining leftof 8 7 6,514 3721 IQ [Show Less]

Preview 2 out of 14 pages