(a) Positive feedback
(b) Inappropriate feedback
Figure 4: State transition for reward and penalty.