| Starting simulation Robot 1 Starting state: F (1) Use (9) (2) Check the -value for state (use (11)) Selected action, = inspect (3) use (10) End of value iteration | Current state: E, Robot 1 (1) Use (9) (2) Check the Q-value for state (Use (11)) Selected action, = inspect (3) Use (10) End of value iteration | Current state: C, Robot 2 (1) Use (9) (2) Check the Q-value for state (use (11)) Selected action, = inspect (3) Use (10) End of value iteration |
| Current state: B, Robot 2 (1) Use (9) (2) Check the -value for state (use (11)) Selected action, = inspect (3) Use (10) End of value iteration | Current state: D, Robot 1 (1) Use (9) (2) Broadcast (use (11)) Selected action, = inspect (3) Use (10) End of value iteration | Current State: A, Robot 2 (1) Use (9) (2) Broadcast (use (11)) Selected action, = inspect (3) Use (10) End of value iteration |
| Current state: A, Robot 1 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = Ignore (3) Use (10) End of value iteration | Current state: D, Robot 2 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = ignore (3) Use (10) End of value iteration | Current state: B, Robot 1 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = Ignore (3) Use (10) End of value iteration |
| Current state: F, Robot 2 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = ignore (3) Use (10) End of value iteration | Current state: C, Robot 1 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = Ignore (3) Use (10) End of value iteration | Current state: E, Robot 2 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = ignore (3) Use (10) End of value iteration |
| Current state: G, Robot 1 (1) Use (9) (2) Broadcast (Use (11)) Selected action, = Inspect (3) Use (10) End of value iteration
| Current state: G, Robot 2 (1) Use (9) (2) Broadcast (use (11)) , that is, Selected action, = ignore (3) Use (10) End of value iteration | Current state: C, Robot 1 (1) Use (9) (2) Broadcast (Use (11)) Selected action, = Ignore (3) Use (10) End of value iteration |
| Current state: C, Robot 2 (1) Use (9) (2) Broadcast (Use (11)) , that is, Selected action, = ignore (3) Use (10) End of value iteration | All QLACS 1 states exhausted Goal state: H, Robot 1 (1) Use (9) (2) Broadcast (use (11)) Selected action, = Shutdown (3) Use (10) End of value iteration | All QLACS 2 states exhausted Goal state: H, Robot 2 (4) Use (9) (5) Broadcast (use (11)) Selected action, = Shutdown (6) Use (10) End of value iteration |
| End of policy iteration |
Robot 1 |
Robot 2 | | |
|
|