CS 188
Spring 2020
Introduction to
Artificial Intelligence Written HW 2
Due: Wednesday 03/11/2020 at 11:59pm (submit via Gradescope).
Policy: Can be
... [Show More] solved in groups (acknowledge collaborators) but must be written up individually
Submission: Your submission should be a PDF that matches this template. Each page of the PDF should
align with the corresponding page of the template (page 1 has name/collaborators, question 1 begins on page
2, etc.). Do not reorder, split, combine, or add extra pages. The intention is that you print out the
template, write on the page in pen/pencil, and then scan or take pictures of the pages to make your submission.
You may also fill out this template digitally (e.g. using a tablet.)
First name
Last name
SID
Collaborators
For sta↵ use only:
Q1. MDP: Eating Chocolate /40
Q2. MDPs and RL /30
Q3. Probability: Flowers /30
Total /100
1
Devesh
Agarwal
3033055014
qurmeharKaur DhruvKrishnaswamy ayasuMonta
ArikBarrayoungLiHong Jasoncan
Q1. [40 pts] MDP: Eating Chocolate
We have a chocolate bar of dimensions 1 ⇥ 8, which contains 8 squares. Most of these squares are delicious chocolate
squares, but some of them are poison! Although the chocolate and poison squares are visually indistinguishable,
someone has told us which ones are which. The layout for our chocolate bar is shown below with P indicating poison
squares.
P P P
Eating a chocolate square immediately gives a reward of 1 and eating a poison square immediately gives a reward of
2. Starting from the right end of the bar, there are 3 possible actions that we can take (at each step), and
all of these actions cause non-deterministic transitions as follows:
• Action b1: Try to bite 1 square, which will result in you actually eating 0, 1, or 2 squares with equal probability.
• Action b2: Try to bite 2 squares, which will result in you actually eating 1, 2, or 3 squares with equal probability.
• Action Stop: Stop biting, which will end the game definitively and result in no reward.
(a) [10 pts] Formulate this problem as an MDP.
States: Decide (and explain) how you want to represent a “state.” Using that representation, list out all of
your possible states in this MDP.
Actions:
Transitions:
Rewards: Skip the explicit formulation of the rewards for this problem.
2
numberofsquares remaining leftof
8 7 6,514 3721 IQ [Show Less]