Abstract
Intention recognition is significant in many applications. In this paper, we focus on team intention recognition, which identifies the intention of each team member and the team working mode. To model the team intention as well as the world state and observation, we propose a Logical Hierarchical Hidden SemiMarkov Model (LHHSMM), which has advantages of conducting statistical relational learning and can present a complex mission hierarchically. Additionally, the LHHSMM explicitly models the duration of team working mode, the intention termination, and relations between the world state and observation. A Logical Particle Filter (LPF) algorithm is also designed to infer team intentions modeled by the LHHSMM. In experiments, we simulate agents’ movements in a combat field and employ agents’ traces to evaluate performances of the LHHSMM and LPF. The results indicate that the team working mode and the target of each agent can be effectively recognized by our methods. When intentions are interrupted within a high probability, the LHHSMM outperforms a modified logical hierarchical hidden Markov model in terms of precision, recall, and measure. By comparing performances of LHHSMMs with different duration distributions, we prove that the explicit duration modeling of the working mode is effective in team intention recognition.
1. Introduction
Intention Recognition (IR) is to identify the specific goals that an agent/agents is/are attempting to achieve [1]. Since the goals are always hidden in mind, they can only be inferred by analyzing the agents’ observed actions and/or the changes in the environment resulting from their actions.
The meaning of recognizing intention is significant in both real and virtual worlds. For example, in realtime strategy games, AI players can choose more efficient policies if their enemies’ actions are known [2]; in the field of public security, we want to distinguish the persons who may behave abnormally and pay more attention to them through monitor devices [3]. Since the IR problem is common, it has attracted many attentions in recent decades.
As a famous branch of probabilistic graphical models (PGMs), the hidden Markov model (HMM) is very popular for analyzing sequential data in many applications [4]. In the IR domain, actions are discrete and observations are got sequentially. Thus, the HMM can also be used to solve IR problems, whose main idea is as follows: the hidden states and observation sequence correspond to actions and the change of world states, respectively, and the intention is represented by the transitions between actions. Then, the inference of intention is solved by a most likelihood estimation: to find the intention which can reproduce the given observation sequence with the largest probability.
Even though the HMM looks suitable to model intentioned behaviors and observations, there are still some problems when the HMM is applied in the IR domain:(a)The HMM has a strong Markov assumption. In the HMM, the duration of a state is implicitly a geometric distribution, whose parameter is the state selftransition probability. However, in many applications, the duration of hidden states does not follow the geometric distribution.(b)The HMM does not have a hierarchical structure. However, when a mission is complex, we always need to decompose it into subtasks repeatedly until the mission only consists of primitive actions, and the hierarchical structure is necessary to present the task decomposition and allocation.(c)The HMM is actually propositional. There is no concept of the class or object in the HMM, which makes the HMM a poor representative. Additionally, the propositional model is not suitable to tackle relations, which are important in the IR.(d)The most likelihood inference assumes that the intention is static. However, the team intention may be interrupted and changed because of new situations or other reasons.
To solve these problems, researchers modified the HMM in different ways. For example, the Hidden SemiMarkov Model (HSMM) uses an arbitrary distribution such as Poisson and Gamma to model the state duration explicitly and sets the selftransition probability zero [5]. In this way, the HSMM models the state duration more precisely, and it always outperforms the HMM when the type of state duration distribution is known. The Hierarchical Hidden Markov Model (HHMM) and Abstract Hidden Markov Model (AHMM) are two extensions of the HMM [6, 7]. They present the task hierarchies based on the theories of Hierarchical Finite State Machine and the Abstract Markov Decision Process, respectively. The Coxian Hidden SemiMarkov Model (CxHSMM) further combines ideas of the HSMM and the HHMM [8]. Two important contributions of the CxHSMM are as follows: (a) proposing a novel Dynamic Bayesian Network (DBN) structure to model human daily activities; (b) introducing the discrete Coxian distribution in the behavior modeling domain.
Another type of extensions is related to relation data. These models fall into the theory of the Statistical Relational Learning (SRL), which integrates the relational or logical representations and probabilistic reasoning mechanisms with machine learning [9]. For example, Kersting et al. proposed a logical hidden Markov model which combined the firstorder logic with the HMM [10]. Comparing with the HMM, the LHMM can infer complex relations and have fewer parameters. However, it does not relax Markov assumption, which leads to a performance decline when longterm dependences between hidden states exist. To solve this problem, Zha et al. proposed a Logical Hidden SemiMarkov Model (LHSMM), which modified the LHMM in the same way the HSMM modified the HMM [11]. Natarajan et al. proposed a Logical Hierarchical Hidden Markov Model (LHHMM) and applied it upon recognizing intentions of an agent in a virtual game world [12]. The LHHMM has a hierarchical architecture and divides the state space into two types: (a) the unobserved user state and (b) the completely observed world state. Besides, the abstract state transitions are conditioned by the world state, which reflects how the world state affects the state of user.
Extensions of the HMM above focus on recognizing intentions of one agent. However, teamwork is quite common in many scenarios. In this paper, we focus on the problem of team intention recognition. Obviously, recognizing the intention of a team is more complex than that of a single agent. Because (a) we need to recognize the goal of each agent as well as the team working mode, (b) task decomposition and allocation, which depend on the team working mode, need to be presented hierarchically, (c) the team intention may be interrupted with some unknown reasons, and (d) observations are noisy and partially missing.
The research on the LHSMM has shown that (a) the logic predicates and instantiation process can represent the working mode and the composition of the team well and (b) modeling duration of abstract state can get a higher precision and smoother recognition curve [13]. However, the LHSMM does not provide a hierarchical architecture. The LHHMM is also promising, but Natarajan et al. only consider single agent scenarios, and the LHHMM may suffer some problems when it is used in team intention recognition: (a) tasks modeled by the LHHMM are executed from bottom to up, which means the top goal cannot be interrupted when the subtasks are not completed; (b) the duration of the working mode is not modeled; (c) observation cannot be noisy or partially missing; (d) the chain structure is not proper to represent the primitive actions [12]. Actually, unlike higher level task, primitive actions usually depend on the world state and the goal, as that defined in a Markov decision process (MDP).
To solve problems of applying existing models in team intention recognition, we propose a framework named Logical Hierarchical Hidden SemiMarkov Model (LHHSMM). The LHHSMM borrows the ideas of the LHHMM and combines it with the LHSMM as well as the MDP. Our LHHSMM has advantages as below:(a)Comparing to PGMs such as the HMM and its extensions, the LHHSMM has the advantages of SRL methods. By introducing the first order logic, LHHSMM can infer complex relations and use logical inference to replace some probabilities computing. Additionally, the predicates and instantiation process are very suitable to represent team working mode and changes of team member.(b)A novel structure named logical hierarchical Markov chain (LHMC) is proposed to present logical transitions and decomposition of the intention and its subtasks (we call the intention and its subtasks policies). With this structure and transition conditions, our model inherits the compactness of presenting policies hierarchically in the LHHMM, plus a mechanism which makes the executing subtasks terminated forcedly in case that intentions are changed.(c)Considering that the team intention may be interrupted by some unknown reasons, we use a lognormal distribution to model the duration that the team working mode is not interrupted, or the time remaining before an interruption event happens. With this explicit duration modeling, the current team intention depends on not only the team intention and world state at previous step, which is another difference between the LHHMM and the LHHSMM and the reason that our model is called semiMarkov.(d)Primitive actions are selected based on the current policies and previous world states. This MDPlike process makes the agents choose any primitive actions without limits of the executed action in previous time.(e)Observation functions are used to present the probabilistic relations between the world state and observation. It makes our model able to tackle noisy and partially missing observations.
To infer the team intentions modeled by the LHHSMM approximately, we provide a Logical Particle Filtering (LPF) based on logical definitions and dependency of the LHHSMM. In the LPF, we use the simplest importance distribution and a forward sampling method to sample the particles, and logical transitions and instantiation functions in the LHHSMM are introduced in this process.
We design a combat scenario to validate the LHHSMM and LPF: two agents move around and attack targets on a grid map. Our methods are used to infer the team working mode and targets of agents online according to the observed agents’ traces. Based on this scenario, we design a decision model for the agents and generate a dataset consisting of 100 traces. We use three traces (one changes intentions, others do not) in the dataset to evaluate the LHHSMM and the LPF. Then, three metrics including precision, recall, and measure of recognizing the team working mode and targets of the agent are computed by the LHHSMM and a modified LHHMM, respectively. Last, we compare the performances of LHHSMMs with three different duration distributions to evaluate the effects of explicit duration modeling.
The rest of the paper is organized as follows: Section 2 introduces some related works including extensions of the HMM applied in IR and other researches on team intention recognition using PGMs. Section 3 gives the formal definition of the LHHSMM, the dependency among variables, and discussions on how to use a LHMC to present policies. Section 4 introduces the standard PF briefly and gives the process of inferring intentions approximately by the LPF. Section 5 presents the background, settings, and results of our experiments. Subsequently, we have conclusions and discuss future works in Section 6.
2. Related Works
By making an intersection of psychology and artificial intelligence [14], a great number of IR methods have been proposed; these methods can be divided into four categories broadly: consistency matching, probabilistic methods, hybrid methods, and statistical relational learning methods [15].
The logical reasoning based on event hierarchy [16] and fast and complete symbolic methods [17] are two representative consistency matching approaches. They solved the IR problem by determining which intention was consistent with observed actions, that is, whether the observed actions matched at least a plan achieving the intention. Their main drawback is that they may fail when there are two or more hypotheses to give contradict explaining upon the same observations. Relatively, probabilistic methods can provide us a probability for every possible intention, and they can make full use of prior knowledge by Bayesian inference [18]. Because of these advantages, some kinds of probabilistic method are applied in IR domain, such as probabilistic grammar [19], PGMs including the HMM [20], and the Conditional Random Field (CRF) [21]. To improve the computational efficiency, some scholars proposed hybrid models, such as hybrid symbolicprobabilistic plan recognizer [22] and probabilistic hostile agent task tracker [23]. Their primary idea is to compute the likelihood of possible intentions using Bayesian reasoning after scaling down the hypothesis space by logical algorithms. However, these hybrid methods inherit the drawbacks of the related consistency matching and probabilistic methods. One key problem of PGMs and their corresponding hybrid methods is that they are actually propositional, which means they handle only sequences of unstructured symbols. Thus, the SRL was proposed and developed rapidly in the recent years. The SRL presents and infers the relations, which are quite important to recognize intentions in real world. Therefore, many researchers made great efforts to apply the SRL methods in the IR problem, such as the LHMM [10] and the Markov Logic Network (MLN) [24, 25].
In this section, we first review some research about applying PGMs in team intention recognition. Then, some extensions of the HMM related with our model will be analyzed.
2.1. Team Intention Recognition Using PGMs
A probabilistic graphical model is used to encode a complex distribution over a highdimensional space, by using a graphbased representation. The graph usually consists of the nodes and edges, which correspond to the variables in the domain and direct probabilistic interactions between variables, respectively [26]. Two types of PGMs are widely applied in team intention recognition: the directed ones such as the DBN and HMM and the undirected ones such as the relational Markov network and the CRF.
Masato et al. [27] introduced the CRF to automatically recognize the composition of teams and team activities in relation to a plan. Mao et al. [28] viewed group plan recognition as inferring the decision making strategy of observed agents. They assumed that agents were rational and made the probabilistic reasoning based on the maximum expected utility principle. Their plan representation and inference were actually done on a Bayesian network. Pfeffer et al. [29] studied the problem of monitoring goals, team structure, and state of agents, in dynamic systems where teams and goals changed over time. By using DBN, they modeled coordination and communication of attackers explicitly in an asymmetric urban warfare environment. Saria and Mahadevan [30] presented a theoretical framework for online probabilistic IR in cooperative multiagent systems. Their model extended the abstract hidden Markov Model (AHMM) and consisted of a hierarchical dynamic Bayesian network that allowed reasoning about the interaction among multiple cooperating agents. Gaitanis [31] modified Saria’s model by releasing the assumption that there was only one team grouping all the agents at only one level of coordination. Although these models solve multiagent IR problems successfully to some extent, they suffer the drawback of traditional PGMs: they are propositional and cannot make use of relations.
Some SRL methods such as the MLN and LHMM can also be regarded as special cases of PGMs. Sadilek and Kautz [32] modeled “capture the flag” domain using MLN and learned a theory that jointly denoised the data and inferred occurrences of highlevel activities. Their research showed that the MLN was quite potential to solve multiagent IR problems. To compare the performance of the CRF, HMM, and MLN, Auslander et al. [33] used these methods to help commanders to detect maritime threats. Their evaluation corpus was from the 2010 Trident Warrior exercise, and they proved that the MLN was better to represent domain knowledge and learn weight settings from a few training instances. As far as we know, there are few attempts to apply extensions of the LHMM in team intention recognition, except for our previous research in [13].
2.2. Extensions of the HMM Applied in the IR Domain
Even though the HMM has some advantages in behavior modeling, the strong Markov assumption limits its application in many areas. Thus, some research has been done to extend HMM in the IR domain.
A Hidden semiMarkov model (HSMM) is the same as the HMM, except for modeling the duration of hidden states explicitly. Because of the advantage of modeling state duration precisely, people use HSMMs to solve IR problems in the digital games. For example, Hladky and Bulitko applied the HSMM to predict the position of a player in the first person shooting game [34]. The database consisted of 190 game logs collected in Counter Strike competitions. With these data, they trained the HSMM and evaluated predictor performance by both prediction accuracy error and human similarity error. Southey et al. also used the HSMM to recognize the destination and start point of an agent on the grid map of WarCraft 3 [35]. Their main contribution is that the observed trajectory can be partially missing.
van Kasteren et al. compared the performances of activity recognition using the HMM, HSMM, CRF, and SemiMarkov CRF, respectively; their activity data was recorded by real sensors in smart home [36]. The experiments showed that the modeling of duration explicitly always improved the recognition performances.
AHMM is a stochastic model for representing the execution of a hierarchy of contingent plans [7]. The core conceptions of the AHMM are the abstract policy and the termination variable. The abstract policy is defined as the selection of lowerlevel abstract policy or primitive action, given the current states and higherlevel policy (the toplevel policy only depends on the state). The termination variable indicates whether the corresponding abstract policy will terminate in each time slice. When a policy does not terminate, its higher level polices cannot terminate either. When the AHMM is used to recognize policies, there is also an observation layer, which depends on the states. Bui et al. applied the AHMM to track an object and predict the object future trajectory in a widearea environment [37].
Another famous extension of the HMM is the Coxian hidden semiMarkov model (CxHSMM) [8]. This model modifies the HMM in two aspects: on one hand, it is a special DBN representation of a twolayer HMM, and it also has termination variables; on the other hand, it used Coxian distribution to model the duration of primitive actions explicitly. This model was applied in recognizing human activities of daily living (ADLs), and Duong et al. showed that Coxian duration model had advantages over existing duration parameterization using multinomial or exponential family distributions.
3. Logical Hierarchical Hidden SemiMarkov Model
The LHHSMM is a fusion of logical hierarchical hidden Markov model, logical hidden semiMarkov model, and Markov decision process. It is used to model the team to be recognized as well as the world state and observation. In this section, we will give a formal definition of the LHHSMM and describe the dependency by a DBN representation. Then, we will explain how to use a logical hierarchical Markov chain to present the logical transition and decomposition of policies.
3.1. Definition
LHHSMM in one time slice is a tuple , where is a belief network with a time label, is the set of transitions from to , and defines the initial distribution in .
represents the set of the policies to be recognized from level1 to level ( is the top level), the primitive action, the world state, and observation at time . The level policy is called intention and the primitive action is in level0. Each is associated with a logical alphabet in firstorder logic, where belonging to level (), is a set of relation symbols with arity and a set of function symbols with arity . When , becomes a proposition, and is a constant when . An atom is a relation symbol followed by a bracketed tuple of terms . A term is a variable or a function symbol immediately followed by a bracketed tuple of term . A term is called ground when it contains no variables. The Herbrand base of , denoted as , is the set of all ground atoms constructed with the predicate and function symbols in . The set of an atom consists of all ground atoms that belongs to [10].
Definition 1 (policy and primitive action). A level abstract policy is an atom with items in , and the instantiated is called ground policy , which belongs to . In our paper, the top level ground policy is also regarded as an intention. An abstract primitive action is an atom in , and is the corresponding ground primitive action. Each is determined by a function , except for which is predefined. Thus, the abstract policy determines the logical alphabet in its lower level.
Example 2. , which means two kinds of intentions: attacking the target or avoiding the target, but there will be four intentions after the instantiation process, because the target can either be or . Then, is an abstract policy and is policy . determines the level0 logical alphabet , which cannot be decomposed further.
Definition 3 (the world state and observation). The world state and observation are both sets of variables. depicts the status of the agents and the environment; the variable symbols in are not predefined but are generated by instantiation functions and logical transitions, which will be introduced later. The observation is a function of ; it provides us some inaccurate and partial information about .
Example 4. , which means the world state consists of a variable, the position of agent A, and there are four possible positions totally. is the observation of , the observed value of may reflect the real position of agent A or not, and the relations of and are presented by conditional probability .
Definition 5 (instantiation function). An instantiation function in level is defined by policy in the higher level, except that is predefined. is a mapping , which can also be presented as a conditional probability , where is an atom relation symbol in , is the world state in previous time, and is an instantiated object of . It should be noted that when a variable in has been instantiated in the higher level policy, the variable should be substituted by the former instance directly and cannot be selected by again.
Example 6. For the in Example 2, we can set , , , and . If is instantiated as in level1. Then, we can set , (there are only two directions), but can only be 1.
Definition 7 (policy termination [7]). Policy termination variable indicates whether the level policy will be terminated () or not () at the current time. When , will continue at the next time; in the other case, it will be changed according to or .
Definition 8 (intention duration [8]). Intention duration variable is a counter to model the time remained before an interruption event happens. Since the reason of the interruption is always unknown, we initialize the value of using a lognormal distribution when an intention starts. The value of will reduce 1 in each step, and the intention will be interrupted when . Then, will be initialized again.
is the set of horizontal transitions, which determines how transits into . There are two types of transitions in : ones are logical transitions which can be further classified into conditional logical probabilistic transitions, specific logical transitions, and unified logical transitions and the other ones are standard probabilistic transitions (we just call them probabilistic transitions) as those in PGMs. The logical transitions only exist between abstract policies in the same level, and transitions in each level are generated by abstract policies in the higher level, except that level logical transitions are predefined. The probabilistic transitions will be analyzed when we introduce the DBN representation of our model.
Definition 9 (conditional probabilistic logical transition [12]). A conditional probabilistic logical transition has a form , which means that the abstract policy will transit to from with probability , where is the current abstract policy and is the next abstract policy, they are both atoms in the alphabet of the current level, is a conditional probability which can be presented as , where represents conditions associated with , which is a set of logical sentences whose variables are included in the world state. Thus, we get the true value of if we know the current world state. can also be an empty set, and the value of only depends on the higher level and policy and in this case. We need to emphasize that if the variables of have been instantiated in , they will be substituted by the constants directly.
Definition 10 (specific logical transition). A specific logical transition has a form , which means that we replace the current policy with , where is the more specific case of . As a kind of default reasoning, this kind of transition is often used to model exceptions, and it will be followed when the instance in is consistent with that in .
Definition 11 (unified logical transition). A unified logical transition has a form , which means that we replace the current policy with , where relation symbols of and are the same, but the orders of their variable symbols are different. When we use unified logical transition, the grounds and will be unified. This kind of transition is forced to follow if the current policy is .
These three kinds of logical transitions can be represented by solid edge, dashed edge, and dotted edge, respectively, in a FSM, as in the LHMM [10]. We need to note that since dotted and dashed transitions do not take real time, they can only be presented in a FSM but cannot be reflected in the DBN representation. Figure 1 shows three examples to explain them.
(a) Solid edges
(b) Dashed edges
(c) Dotted edges
, , and are relation symbols which represent abstract policies, , , and are variables, and is a constant. Suppose that we have computed the values of all conditional probabilistic logical transition according to the world state, and they are denoted on the edges. In Figure 1(a), the abstract policy can transit to or from with probabilities 0.4 and 0.6, respectively. If the abstract policy reaches , the instantiation results of in must be the same as in . In Figure 1(b), if the instantiated result of in is not or we have no idea about it, the abstract policy transits to or with probabilities 0.4 and 0.6, respectively. However, if we know the instance of is , the probabilities above will change to 0.6 and 0.4. In Figure 1(c), if the current abstract policy is , we will follow the dotted edge automatically and replace the abstract policy with , and the instances of and will be the same as instances of and in . In this way, changes its instances consuming one time.
defines the initial distributions of abstract policies in , returns the initial distribution of abstract policy in level (), which is generated by the current abstract policy , and is predefined. is only used to sample when the level policy in the previous time has been terminated. For the in Example 2, can be represented as .
3.2. Dependency in the LHHSMM
In this section, we will use a DBN presentation to describe the dependency among variables in the LHHSMM. However, the standard DBN is not available to present the logical transitions and instantiation process in our model, since it is actually propositional. Thus, we can only show the full DBN after substituting all variables. To explain logical dependency under standard probabilistic transitions, we will analyze the factors which each policy, primitive action, termination, duration, and state depend on and discuss the details about the logical transitions and instantiation process after that. Figure 2 shows the subnetwork for a level policy.
(a) Full dependency
(b) Dependency when
(c) Dependency when
When , depends on two ground relation symbols as well as and two variables and , as is shown in the full dependency. Figure 2(b) means that when has been terminated at time , the transition and instantiation of will depend on , and . The transition process should be considered in two cases: when is not terminated at time , defines the set of possible and determines the probability of transiting from to together with whose true value depends on ; when is terminated at time , determines the initializing probability of . For the instantiation process, the instances of the same variables in and must be consistent, and the instances of should also satisfy the logical transitions between and . Figure 2(c) means that inherits totally when is not terminated at time . When , only depends on , , and , and the selection and instantiation of the intention are similar as other level policies, except that there is no influence from higher level policies. Figure 3 shows subnetwork for duration of the intention.
(a) Full dependency
(b) Dependency when
(c) Dependency when
Duration depends on a relation symbol and two variables and , as is shown in Figures 3(a), 3(b), and 3(c), explain two cases, respectively: when is terminated at time , will be initialized and have a new value according to ; when is not terminated at time , we make . Figure 4 shows subnetwork for level () policy termination.
(a) Full dependency
(b) Dependency when
(c) Dependency when
Figure 4(a) shows that depends on a ground relation symbol and two variables and . In Figure 4(b), when , will be forced to 1, which means a policy will be terminated forcedly when its higher level policy ends. In Figure 4(c), , and the value of depends on a conditional distribution . Particularly, we set , for all the policies which are not any head of logical transition in . In other words, this kind of policies can only be terminated when . Figure 5 shows subnetwork for intention termination.
(a) Full dependency
(b) Dependency when
(c) Dependency when
Figure 5(a) shows that depends on a ground relation symbol and two variables and . In Figure 5(b), when , which means the intention has been interrupted because of unknown reasons. Figure 5(c) shows the case that when , the value of is determined by , just like policies in other levels.
Relationships between world state, observation, and primitive action are similar as Markov decision process. First, given the level1 policy and the world state , an abstract primitive action in will be selected by a probability . Then, will be instantiated into by , and determine how the current world state transits to , and the observation is a function of . The primitive action does not have a termination variable; it needs to be selected according to the process above in each time slice. The variables and dependency are presented in a full DBN representation, which is shown in Figure 6.
The DBN representation depicts the dependency in and transitions between and . We need to note that the vertical dashed lines in Figure 6 are not logical transitions but are brief representations for the nodes and edges from level3 to level 1. As is shown in the DBN representation, our model actually consists of three parts: sensors which are modeled by the world state and observation levels; a Markov decision process depicted in the structure below level1, if we regard the level1 policy as a special case of state; a hierarchical chain structure above level0, which depicts the team policies. However, the DBN is unavailable to present the logical transitions and instantiation process in the third part. Thus, we will use a hierarchical logical Markov chain to describe the decomposition and transition process of team policies in our model.
3.3. Team Policies Presented by Hierarchical Logical Markov Chain
Unlike the DBN structure, a logical hierarchical Markov chain (LHMC) ignores the variables of duration, conditions of logical transitions, primitive actions, world states, and observations but only depicts the decomposition of policies and logical transitions among them. A LHMC is a tree structure whose each node is a logical Markov chain (LMC); the LMC can be regarded as a LHMM without any observation, just like the state transition process in a standard HMM is a Markov chain. In this paper, an abstract state in the LMC represents an abstract policy and its child is a LMC in its lower lever except for the policies in leaf nodes. Figure 7 shows an example of team policies presented by a hierarchical logical Markov chain.
The team policies in Figure 7 consist of two layers: level1 and level (). The LMC in level2 presents the possible intentions transitions between them. In team intention recognition problems, the relation symbols in highest level usually represent different team working modes. For example, indicates a team working mode, and and are specific constants. The node and node in the level LMC are used to control the simulation process. If the intention reaches the node, the simulation will be ended. The outlines from the node are associated with an initial distribution of abstract intention. When agents choose a level policy (), they need to execute policies in its associated LMC, as shown in the eclipse (Figure 6 ignores mappings of other policies for simplicity). For example, to finish , the execution in level1 will begin according to the initial distribution defined by , and the first executed level1 abstract policy may be or ; when the level1 policy is terminated but is not, level1 policy will be transited in the LMC. But when the world state satisfies the terminating conditions of or it is interrupted, we will return to abstract policy in level2 and transit to another abstract intention in level LMC immediately.
4. Approximate Inference
In this section, we will discuss how to infer team intentions presented by the LHHSMM. Online intention recognition is essentially treated as a filtering problem. The policies, primitive actions, duration, and termination can be regarded as states of a dynamic system. They cannot be observed directly but can produce some observations sequentially. Our goal is to infer the real state at each time according to these observation series. Since there are noise and data missing when we observe the state, we will use a particle filter (PF) to solve the approximate inference problem. However, the standard PF does not define any logical transition and instantiation process. Thus, we will propose a new logical particle filter by introducing logical definitions and dependency in LHHSMM.
4.1. Standard PF with Simplest Sampling
Suppose that we have a dynamic system whose real state at time is , and is an observation of . is the prior initial probability. is a particle set which describe the posterior distribution , where and . is the number of particles, is the normalized weight of , and . Then, the posterior distribution of the state can be computed bywhere is the Dirac function and is got byin the PF, the is known as the important distribution which is used to sample , and it can be factorized by And the posterior distribution can be represented byAfter substituting formula (3) and formula (4) into formula (2), we can update the importance of weights bywhen is a Markov system, , and formula (4) can be simplified into
When we choose optimal importance distribution , ; however, it will be very hard to compute in our models. Thus, we use a simplest importance distribution , and in this case. After normalizing , we need to resample the particles to solve the problem of particle degeneration. This process will abandon the particles with small weight and copy particles with large weight. The weights of the new particles after resampling will be set equal and we can compute the poster probability of the system state by using formula (1). Other details about this process can be found in [38].
4.2. Logical Particle Filtering in the LHHSMM
The standard PF cannot be applied to recognize intentions in our LHHSMM, since it does not allow logical transitions and instantiation process. In this section, we presented a logical particle filtering based on logical definitions and dependency of our model. A particle in a LHHSMM is a tuple , where is an intention, and observation of a is . Since only depends on , . To sample from , we use the simplest importance distribution and forward sampling. Steps of LPF are shown as follows.
Step 1 (initialization). Set time , , , and ; sample from a prior distribution for each particle.
Step 2 (forward sampling). Set , sample elements in each particle . Based on dependency in the DBN structure, the forward sampling process can be further decomposed into five steps: sampling intentions and duration, policies, primitive actions, the world state, and termination variables. The pseudocode is shown in Pseudocode 1.

There are two points to be noted: one is that when an abstract policy is selected and executed for the first time, it should update the alphabets, instantiation functions, and logical transitions in its corresponding lower level LMC. However, when a policy has not been terminated, elements in LMC do not need to be changed, and those in the top level LMC will keep; the other point is that when sampling an abstract policy from logical transitions, we must follow the specific logical transition and unified logical transition if conditions are satisfied and then get the new abstract policy by conditional probabilistic logical transition immediately.
Step 3 (updating weights and resampling). Update the weight of each particle by , and get new set of particles by resampling and set .
Step 4 (computing poster distribution of intentions). In this paper, we only care about the abstract intention and the constants in the instantiated intention. Since they are discrete, we can compute them by and , where is an indicator function and is a possible constant in . If the simulation does not end, go to Step 2.
With the four steps above, we can compute the probabilities of the team working mode and goals of agents at each time.
5. Experiments
5.1. Background
To evaluate the performance of applying the LHHSMM and LPF in team intention recognition, we design a battle scenario. In this scenario, two agents execute an attacking mission individually or cooperatively in a known environment. Their team intention consists of two parts: the team working mode and the specific target of each agent. Our recognition is to compute the probabilities of the team intention sequentially according to the continuous observation of agent traces.
There are some characters of this scenario: first, the agents can act individually or constitute a team; second, the attacking mission can be decomposed into subtasks and primitive actions; third, the team intentions can be interrupted because of new orders or other unknown reasons; last, the observed traces have noise and we may not have any position record at all at some time; Furthermore, the observed data is got sequentially, and we need to update our recognition results when new evidence arrives. The initial situation of the battlefield is shown in Figure 8.
The battlefield map consists of grids; the blue and red diamond points are the initial positions of agent A and agent B, respectively; the black points indicate buildings which the agents cannot get through; the two green grids are assembling positions; the three yellow grids are targets which may be attacked by agents; the white girds make up passable ways for motion. In each time step, the agent can move to an adjacent blank grid or stay in its current position. Before inferring intentions, we give a decomposed LHMC representation of the team policies in this scenario, as is shown in Figure 9.
(a) The LMC in level ()
(b) The LMC in level1 under
(c) The LMC in level1 under
Figure 9(a) depicts the logical Markov chain in the top level (); there are two relation symbols: (a) means the two agents will attack cooperatively, and their common target is ; (b) means they will execute their missions individually; that is, agent A attacks and agent B attacks . Variables and represent the targets, and the possible value could be , , or . We also depict the initial distribution of level LMC in Figure 9(a). Figures 9(b) and 9(c) depict the LMCs under and , respectively. In level1, there are two relation symbols; they are (a) , which means agent A and agent B are trying to get together at the assembling position and (b) which means agent A is going to destroy and agent B is going to destroy . is a variable, and its possible value could be or . and have been instantiated in the top level. According to transitions in Figures 9(b) and 9(c), the agents will first assemble at a point and then go to destroy the target together, when they attack a target cooperatively. But when is executed, agents have different targets and do not affect each other.
The team has only one abstract primitive action , where and indicate the moving directions of agents A and B, respectively, and there are 5 possible directions: , , , , and . If the direction is instantiated as , the agent will stay at the current position. Selection and instantiation of abstract primitive action depend on the level1 policy and the previous state.
In this scenario, the selection of policies and actions is only related to the positions of agents. Thus, the world state is defined as the set of and , where returns the current gird of . The observations are functions of them which will be introduced when we explain the observation model.
5.2. Settings
5.2.1. Transition Probabilities in LMCs
There is only one possible logical transition in level1 and the transition probability is always 1, and the conditional transition probabilities of level LMC are shown in
B, C, D, E, and F represent policies shown in Figure 9(a), means that the target is occupied, and this proposition will be true when any agent reaches the grid of . is the probability of transiting from abstract policy C to abstract policy B. is the probability of terminating the simulation after executing abstract policy C, and other transition symbols have the similar meaning. The level LMC shows that the team working mode may alter between cooperation and independence, and the simulation can only end when one of the targets is destroyed.
5.2.2. Policy Termination
In our LHHSMM, termination of policy depends on the probability . When is , we set for all which satisfy that the positions of the two agents and target are all the same, and for other . When is , we set for all which satisfy that agent A reaches target or agent B reaches target , and for other .
When is , we set for which satisfy that the positions of the two agents and assembling point are all the same, and for other . When is or , we set for all , because these two policies do not have outlines in their LMCs.
5.2.3. Selection and Instantiation of Primitive Actions
In this scenario, selection of primitive actions can be neglected since there is only one abstract primitive action. Instantiation function defined by is set as follows:(1)When is or , agent A will move in the direction which can make it on the shortest way to its instantiated target; if there are two directions satisfying this condition, agent A will choose one with a probability 0.5, and so will agent B.(2)When is and shows that both agent A and agent B have not reached the assembling position, agent A will move along with the direction which ensures it moving on the shortest way to ; if there are two directions satisfying this condition, agent A will choose one with a probability 0.5, and so will agent B.(3)When is and shows that one agent has reached the assembling position, but the other one has not, the agent who is still on its way will move on the shortest way to ; if there are two directions satisfying this condition, this agent will choose one with a probability 0.5, and the other agent will choose the direction as .
5.2.4. The Duration Model
After the abstract intention or keeps for time , an interruption event will happen with a probability , , is the necessary time to realize the intention which depends on the state when the intention begins, is the probability density function of the lognormal distribution , andWe set , , and .
5.2.5. Observation Model
Observation at one time is a set , where is a function which returns the true grid with a probability 0.6, returns with a probability 0.2, and returns a false gird with a probability ; is computed bywhere is the set of grids containing buildings and is the set of all grids except and grids in . and represent the horizontal and vertical coordinates of grid , respectively. When returns , we will have no information about the observed agent, and the observations of two agents are independent.
5.2.6. Other Settings
To simplify the instantiation functions of intentions, the specific targets are selected according to their tactical values. We set the normalized values of , , and as 0.3, 0.4, and 0.3, respectively. The agents will instantiate the target variable in abstract intention one by one. The probability of choosing a target is proportionate to its tactical value. For example, when we instantiate a variable whose possible constants are and (we have known the value of is ), the probability of instantiating as will be . For instantiation function in level1, there is only one variable ; the probability of its value depends on the value of , as is shown in Table 1.
In the approximate inference, we set the number of particles .
5.3. Results and Discussion
We run the scenario repeatedly and produce a test dataset consisting of 100 traces. With this data set, we compute the recognition results of specific traces to validate the LHHSMM and the LPF and compare the performances when intention duration distributions are different.
5.3.1. The Recognition Results of Specific Traces
Three traces in test data set are selected to compute the probabilities of team intentions by LHHSMM. The details of these three traces are shown in Table 2.
As shown in Table 2, agents in trace number 5 execute the attacking mission individually: the target of agent A is and the target of agent B is . For trace number 5 there is no interruption and the mission is completed at . In trace 17, the agents choose the cooperation working mode and do not change their intention during mission either. They occupy the target successfully at . Trace number 57 is a bit more complicated. The agents decide to attack and individually first, but they change the intention and attack together at . They successfully assemble and go to together. However, before they reach , their intention is interrupted again and agent A attacks alone. This intention is terminated at because agent B occupied . We use LHHSMM to recognize both team working modes and intentions in these three cases, and the results are shown in Figures 10, 11, and 12.
(a) The probabilities of working modes in trace number 5
(b) The probabilities of targets of agent A in trace number 5
(c) The probabilities of targets of agent B in trace number 5
(a) The probabilities of working modes in trace number 17
(b) The probabilities of targets of agent A in trace number 17
(c) The probabilities of targets of agent B in trace number 17
(a) The probabilities of working modes in trace number 57
(b) The probabilities of targets of agent A in trace number 57
(c) The probabilities of targets of agent B in trace number 57
Figure 10(a) shows the probabilities of working modes in trace number 5. Even though the prior probability of the real working mode, that is, , is lower in the initial phase, it increases very fast, and the value exceeds 0.8 at . The reason of the shake at is that the agent B is turning left at a cross, but there is noise in the observation of agent B at this time. Figures 10(b) and 10(c) show the probabilities of targets of agents in trace number 5. In Figure 10(b), our models recognize the target of agent A very well, and the probability of target increases very fast. In Figure 10(c), the probability of is larger than that of the real target from to , because agent B chooses a path which is indispensable to reach in this period. When agent B leaves this path at , the probability of the real target is higher again.
Figure 11(a) shows the probabilities of working modes in trace number 17. Except for the shaking near , the probability of the real working mode is always high. The reason of the failure is that some observations are missing in that period. At the same time, the observed noisy data support that the working mode has been interrupted. Fortunately, the probability of recovers fast when new evidences arrive. Figures 11(b) and 11(c) show the probabilities of targets of agents in trace number 17. Since agent A and agent B have the same target, the curves in Figures 11(b) and 13(c) are very similar. The reason of the confusion before is that the agents are assembling and their traces cannot provide new information about their targets; we can find that, even the prior probability of is quite higher, the red curve increase fast.
(a) Precision of recognizing team working modes
(b) Recall of recognizing team working modes
(c) measure of recognizing team working modes
(d) Precision of recognizing targets of agent A
(e) Recall of recognizing targets of agent A
(f) measure of targets of agent A
Figure 12(a) shows the probabilities of working modes in trace number 57. Generally, our models can recognize the working modes well even if it is interrupted at and . The reason of two shakes near and is that the position of agent A is near the cross but the observation is not accurate. Figures 12(b) and 12(c) show the probabilities of targets of agents in trace number 57. We can find that the results after can reflect the real results well except some shake; even the target of agent A changes twice. However, the real targets are not the highest in most case before . In Figure 12(b), the probability of is generally higher before , because agent A and agent B are on the indispensable paths to and , respectively. After they leave the indispensable paths, the probability of which is the real target of agent A increases a lot, which makes the target of agent B much like , since their working mode is attacking individually. We can also find that although the probabilities of the real targets are not the highest during this period, they are not the lowest at least.
5.3.2. Comparison of the LHHSMM and a Modified LHHMM
To show the advantages of LHHSMM compared with the LHHMM, we add our observation model to the standard LHHMM (we still call it the LHHMM later) and also use a particle filtering to infer intentions. However, since the LHHMM does not model intention interruption, the weights of all particles may be 0 after an interruption happens. In this situation, we reset the particle weights to . Then, inference can be continued to update the probabilities of intentions until the end of the simulation.
The performances of our model and the LHHMM are compared statistically: their performances are evaluated by three metrics: precision, recall, and measure. These metrics are computed bywhere is the number of possible classes, , , and are the true positives, total of true labels, and total of inferred labels for class , respectively. Formulas (10) show that precision is used to scale the reliability of the recognized results; recall is used to scale the efficiency of the algorithm applied in the test data set; and measure is an integration of precision and recall. We can find that the value of all these metrics will be between 0 and 1, and a higher metric means a better performance.
Since the traces in the dataset have different time lengths, we define a variable which is a positive integer. , and the metrics at means that their recognizing object set is , where is the index of the trace and is the length of the trace number . Thus, metrics with different show the performances of algorithms in different phases of simulation. To give the final recognition results, we further define two thresholds and . When , we can regard the recognition result as , where and are the two working modes; otherwise, the result is unknown. When and , we can regard the recognition result as , where , , and are the three targets; otherwise, the result is unknown. The three metrics of recognizing working modes and targets of agent A computed by the LHHSMM and the LHHMM are shown in Figure 13.
The red solid curves and the blue dash dotted curves are computed by the LHHSMM and the LHHMM, respectively. We can find that the LHHSMM outperforms the LHHMM in all the three metrics, especially in the second half of the simulation. In the starting phase, the intention just keeps for a short time and the probability of changing intentions is not large. Thus, the LHHSMM and the LHHMM have similar performances. However, with more and more interruptions of intention occurring, the LHHSMM shows its advantage: it will generally improve the recognition performance when there are more observations.
5.3.3. The Effects of Duration Modeling
Our LHHSMM is semiMarkov because it uses a specific distribution to model the duration that the team intention is not interrupted explicitly. To evaluate the effects of duration modeling, we compare recognition metrics computed by LHHSMMs with three different duration distributions: is the real distribution of the working mode introduced in Section 5.2; is a geometric distribution whose expectation is equal to that of in ; is the same as except that . The test dataset is a subset of the 100 traces, which consists of all of 51 traces where the working mode is changed at least once. Besides, we only evaluate the recognition results of the second half of these traces, because, in the beginning phases of the traces, the lack of evidences may affect the recognizing performances. The recognition results are still evaluated by the metrics and . Because we do not need to show the metrics at different phases, the recognizing object set is not with the parameter , and we recognize the working modes (WM), targets of agent A (TA), and targets of agent B (TB) at all steps of the second half of the 51 traces. Recognition metrics with different duration distributions are shown in Figure 14.
The blue, green, and red bars indicate the performances of our LHHSMMs with , , and , respectively. Obviously, the comparison results of six terms are the same: (a) performs the best, which shows that the type of duration distribution is important to recognize the working mode and the target of each agent; (b) performs better than , which shows that the expectation of the duration distribution has a large impact on the recognizing results. These comparisons prove that a precise duration modeling of the working mode is necessary and our semiMarkov framework is effective in team intention recognition.
6. Conclusions and Future Works
In this paper, we proposed a LHHSMM to recognize team intentions. As a fusion of the LHHMM, LHSMM, and MDP, the LHHSMM possesses advantages to solve the team intention recognition problems in a complex environment: first, it uses a logical predicate to represent the team working mode, which can be recognized together with each goal of agents; second, it has a hierarchical structure and can make use of domain knowledge to present complex tasks; third, due to the modeling of intention duration, LHHSMM can update the probabilities of results correctly even if the intention is interrupted and changed; last, LHHSMM can deal with noisy and partially missing observations.
To solve inference problems of the LHHSMM, a LPF is proposed based on logical definitions and the dependency of the LHHSMM. We also design a combat scenario to evaluate the LHHSMM and LPF; the results show the following: first, no matter intentions are changed or not, our methods can effectively recognize the team working mode and the targets of each agent; second, the LHHSMM outperforms the LHHMM in precision, recall, and measure in case that intentions are interrupted within a high probability; last, an explicit duration modeling of the team working mode is effective in team intention recognition.
In the future, we would like to continue our research on two aspects: (a) to learn parameters in the LHHSMM and (b) to discuss how to get the optimal importance distribution in the LPF. Moreover, applying our model to recognize intentions in a real scenario is also absorbing.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The work described in this paper is sponsored by the National Natural Science Foundation of China under Grants no. 61473300 and no. 61573369.