International Journal of Computer Games Technology

Volume 2017, Article ID 4939261, 10 pages

https://doi.org/10.1155/2017/4939261

## Narrow Artificial Intelligence with Machine Learning for Real-Time Estimation of a Mobile Agent’s Location Using Hidden Markov Models

Département de mathématiques, Université du Québec á Montréal, 201 avenue du Président-Kennedy, Montréal, QC, Canada

Correspondence should be addressed to Cédric Beaulac; moc.liamg@cirdec.caluaeb

Received 18 October 2016; Revised 13 January 2017; Accepted 19 January 2017; Published 14 February 2017

Academic Editor: Michael J. Katchabaw

Copyright © 2017 Cédric Beaulac and Fabrice Larribe. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We propose to use a supervised machine learning technique to track the location of a mobile agent in real time. Hidden Markov Models are used to build artificial intelligence that estimates the unknown position of a mobile target moving in a defined environment. This narrow artificial intelligence performs two distinct tasks. First, it provides real-time estimation of the mobile agent’s position using the forward algorithm. Second, it uses the Baum–Welch algorithm as a statistical learning tool to gain knowledge of the mobile target. Finally, an experimental environment is proposed, namely, a video game that we use to test our artificial intelligence. We present statistical and graphical results to illustrate the efficiency of our method.

#### 1. Motivation

In this paper we address the problem of locating or tracking down a mobile target; specifically, we are interested in estimating the unknown position of a mobile agent. To do so, we rely on two main sources of information: our knowledge of the agent’s usual behavior and the information obtained from the environment (e.g., through vision, hearing). We build a proper solution to this problem by creating a mathematical model, an associated algorithm, and finally programming this algorithm; the result is autonomous artificial intelligence (AI).

A common problem in video game environments when multiple games are played in succession is that a human player will improve with each game played against the same opponent because he or she learns the opponent’s strategies, while the typical video game’s AI would not adapt to its opponent. Yannakakis and Togelius [1] identify core research areas within the field of AI in games. According to the list of areas they identify, our problem is both to create a* believable agent* and to provide it with* behavior learning* abilities.

We were inspired by Hladky and Bulitko [2] to use a Hidden Markov Model (HMM) to achieve this goal. In that context, the hidden state is the unknown position of the mobile agent, and our problem is one of state estimation. Our main idea is to use the Baum–Welch algorithm [3] as a machine learning tool that uses the AI’s observations to build up knowledge on a mobile agent over time. This knowledge is represented by the transition matrix that is used by our AI to estimate the hidden state at any time. This matrix contains the various probabilities that the mobile agent moves from one position to another, that is, from one state to another. By construction, this matrix is central in state estimation; thus, the better our AI estimates the transition matrix, the more accurately it can predict the mobile target’s movements. Our goal is to demonstrate that this learning algorithm can be efficient in that precise context. By using this methodology our AI should adapt to and learn from various situations, which makes it suitable for various competitive video games.

#### 2. Related Work

Particle models were first used in the context of state estimation problems in video game environments by Bererton [4]. His article describes the actual state of AI in video games, defines precisely the major problems, and demonstrates that particle filters are useful in order to solve these problems. This paper inspired a number of other papers that resort to particle filters for addressing similar problems. For example, Weber et al. [5] used a particle model with multiple layers in order to represent the state of a complex game (e.g., an RTS game) and proved the feasibility of the estimation of this state. Southey et al. [6] were among the first to use hidden Markov Models in a video game AI context. They proved that these models could be useful in tracking the movement of various units in an RTS. Finally, Hladky and Bulitko [2] decided to compare the efficiency of particle models with hidden Markov Models in order to estimate the position of a moving agent and noticed that Markov Models might produce more precise estimates in certain situations. A limitation of Hladky and Bulitko’s model was that it had no behavior learning abilities.

More recently, Stanescu & Čertický [7] used* Answer Set Programming* in order to predict an opponent’s production in a RTS video game. Their problem shares many similarities with ours; for example, they only obtain incomplete knowledge on the current situation and they are also trying to predict some elements of the current game state. In order to predict unit production, they generate every valid unit combination and pick the most probable one as their estimate. As we will see in Section 4.2, the recursive structure of the* forward* algorithm will naturally do all of these steps more efficiently in order to predict the opponent’s location.

Reinforcement learning (RL) is also a very popular approach to build strong video game AI. Lately, Wang and Tan [8] and Emigh et al. [9] approached the wider problem of building artificial intelligence that can* win* a video game based solely on RL. Even though both of these articles and many others demonstrate the incredible potential of AI using RL, these models also have their fair share of weaknesses. When it comes to building AI for a rather complex video game, both the extremely large state space and high dimensionality will cause the learning process to be very slow. The uses of RL will also result in AI that takes actions that maximize the expected reward without really understanding why. In a certain way, this AI never truly understands the goal of the game nor the multiple distinct components that can cause a player to win or lose a game. Wang and Tan [8] tackle briefly these problems by creating two distinct learning processes: learning the appropriate behavior and focusing on optimal weapon selection.

Our approach is to further segment AI learning process in order to create a* believable agent* and to reduce the potential dimensionality problems. For instance, for a FPS video game, we would like our AI to understand that it must aim well, establish a good strategy, and estimate its opponent’s location as accurately as possible and that these subtasks are independent of one another. Because of the results of [2], we decided to approach the estimation of an opponent’s location with hidden Markov Models. The Markov assumption is very natural when it comes to modelling movements and, therefore, utilizing a Markov Model would be more appropriate for solving this precise problem as opposed to other articles that are trying to solve larger problems. To further improve the actual models, we will use an Expectation-Maximization algorithm in order to give the AI the ability to build up its own knowledge based on its experience. This article contains the key results that are explained in full detail in the associated thesis [10].

#### 3. Hidden Markov Model Techniques and Notations

In the following section, we define a Hidden Markov Model, the notations we use, and the algorithms related to that model. This model consists of a Markov process and an observation function. First, let us define as the Markov process with state space . Because we will work with discrete-time Markov processes, it is natural to define a transition function and an associated transition matrix** A**. Finally, the Markov process also consists of an initial distribution function which is frequently represented as a vector .

As a hidden Markov process, the realizations of the Markov process are hidden; that is, they are not observed. What we obtain instead, the observation, is a random variable dependent on the hidden state. To complete the definition of an HMM, we need an observation function: .

The challenge is to use a sample of observations to make a statistical inference on the hidden Markov process . For example, it would be useful to be able to evaluate . To proceed, we use forward and backward values as defined by Rabiner [11]. These values will be used in inference and real-time estimation of the state. We adopt the classic definition of the forward values; that is, . The forward values can be calculated recursively:

The initial values are . Backward values are denoted as usual, , and they can also be calculated recursively in a similar manner:using the initial values .

Finally, the* Baum–Welch* algorithm is needed for parameter estimation. This EM* (Expectation-Maximization)* algorithm, adapted for Hidden Markov Model estimation, is the best known and most widely used algorithm related to Hidden Markov Model inference. With this tool, every parameter of our model can be estimated. Note that in our context the initial distribution and the observation function will be known and, thus, need not be estimated.

Forward and backward values are calculated as described earlier using the initial values submitted to the* Baum–Welch* algorithm. We can use these values to compute various expected values, two of which are of particular interest. First:where is the likelihood of the observations. Using the forward and backward values we can also compute the following expected value:

Finally, after computing and for all , and , we can perform the maximization step. As mentioned, only the transition function has to be estimated. We have to find the transition matrix that maximizes the likelihood of our observations. We thus obtain an estimator very similar to the classic Markov Model maximum-likelihood estimator:

To conclude this section, we present the details of how the algorithm works. The first step is to submit the initial values of the Hidden Markov Model’s parameters to the algorithm. These consist of the initial distribution function, the transition function, and the observation function. The second step is to calculate forward and backward values using the parameters we currently have. The third step involves computing the expected values as we did in (3) and (4) using the ’s and ’s we computed in the previous step. Finally, the maximization step consists of using both of these expected values to estimate the parameters of the Hidden Markov Models. We then repeat this process starting from the second step until the parameter’s estimates are stable.

To summarize, here are the steps of the algorithm.(1)Submit the initial values of the parameters** A**, , and .(2)Compute and , and , using the forward and backward algorithms.(3)Compute the expected values and , and , using the forward and backward values.(4)Estimate the parameters** A**, , and using the expected values as in (5).(5)Return to step and repeat until a desired level of convergence is attained.

#### 4. Main Idea of Our Approach

##### 4.1. Mathematical Modelling

First, we define what the Markov process represents. Recall that our main goal is to estimate the unknown position of a mobile agent in a restricted area; in our experimental environment, which is a competitive video game, the mobile agent we are tracking is our opponent. Because its position is unknown, we suppose that its movements follow a Markov chain; that is, the location where the agent will be at time depends only on its location at time . More formally, we define our Markov process, , as the position of the mobile agent, at time . Because this position is unknown we never observe the realizations of this process, but we obtain observations that are dependent on these hidden realizations. These observations are the realizations of a Hidden Markov Model.

Next, we define the numerous parameters we need to work with an HMM. First, we need to define the state space of the Markov process. To this end we must introduce our defined environment, since we are trying to estimate the unknown position of a mobile agent in a known restricted area. In the context of a video game, we call this the map, which is the equivalent of a sports field; it is where the action takes place. For example, in the context of smart vehicle technologies, the map could be a certain city. Our goal is to create a methodology that efficiently estimates a mobile agent’s location on this map. To begin, we grid this map, creating possible positions, and the set of all these positions forms our finite state space .

Now we describe how we define our initial distribution, . In our video games application, we will assume that the initial distribution is directly implemented in the game and known by all players. As most competitive video games have fixed starting locations for each player, a comparison can be made to a game of chess where each pawn starts at a precise location. Some video games have a set of possible starting locations, one of which is chosen randomly. Generally, these possible locations are near one another and are known by every experienced player. Thus, the initial distribution is considered to be known by our AI in our experimental setup.

As mentioned earlier, Hidden Markov Models involve an unobservable realization of the underlying Markov process , and an observation which is a random variable dependent on the hidden realization. In other words, the observations represent the stream of information received from the environment. In our context, the observations are the set of positions we can eliminate at each unit of time and, thus, are random and dependent on the hidden state (the actual position of the mobile agent). The problem consists of using these observations, via a well-constructed observation function, in a manner that will help us track down the agent. In our experimental environment, our methodology is used by an AI battling against a human player. In that context, the observation is the set of positions observed by the AI’s avatar inside the video game at each time frame.

Figure 1 depicts two situations faced when tracking down a mobile agent. We either directly observe the agent or not. If we see the mobile target, as in Figure 1(b), the estimate of the current hidden state is quite simple: it is not an estimate, it is agent’s actual position. The model gets interesting when we actually have to estimate our opponent’s position, as in Figure 1(a).