On Cognitive Searching Optimization in Semi-Markov Jump Decision Using Multistep Transition and Mental Rehearsal

Ren, Bingxuan; Yin, Tangwen; Fu, Shan

doi:https://doi.org/10.1155/2021/3343494

Complexity

On this page

Abstract Introduction Related Work Results Discussion Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Sliding Mode Control of Autonomous Systems and their Applications in Complex Environments

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 3343494 | https://doi.org/10.1155/2021/3343494

On Cognitive Searching Optimization in Semi-Markov Jump Decision Using Multistep Transition and Mental Rehearsal

Bingxuan Ren,^1,2Tangwen Yin,^1,2and Shan Fu^1,2

Academic Editor: Hamid Reza Karimi

Received21 Jul 2021

Revised25 Aug 2021

Accepted01 Sept 2021

Published06 Oct 2021

Abstract

Cognitive searching optimization is a subconscious mental phenomenon in decision making. Aroused by exploiting accessible human action, alleviating inefficient decision and shrinking searching space remain challenges for optimizing the solution space. Multiple decision estimation and the jumpy decision transition interval are two of the cross-impact factors resulting in variation of decision paths. To optimize the searching process of decision solution space, we propose a semi-Markov jump cognitive decision method in which a searching contraction index bridges correlation from the time dimension and depth dimension. With the change state and transition interval, the semi-Markov property can obtain the action by limiting the decision solution to the specified range. From the decision depth, bootstrap re-sampling utilizes mental rehearsal iteration to update the transition probability. In addition, dynamical decision boundary by the interaction process limits the admissible decisions. Through the flight simulation, we show that proposed index and reward vary with the transition decision steps and mental rehearsal frequencies. In conclusion, this decision-making method integrates the multistep transition and mental rehearsal on semi-Markov jump decision process, opening a route to the multiple dimension optimization of cognitive interaction.

1. Introduction

The human-computer cognitive interaction (I) process can be embodied to analyze human factors, interactive performance, and decision uncertainty. In terms of decision making, the decision solution space constructed by the estimation and searching for the solution path is under effect with the uncertainty of human decision [1]. In the I process, the chronologically ordered decision path based on human experience is composed of each decision action step which is uniquely determined under the estimation of the future decision path. From a prior perspective, due to the influence of decision jumpy intervals and the multiple estimation of decision paths, there are infinite possibilities while deciding the decision path from its solution space. It is necessary to reduce the impact of the exploration of solutions on the efficiency of decision making. In the optimization of the cognitive searching, human’s high-level control hierarchy makes preparations for upcoming decision before people realize it [2]. When exploring the decision solution space, searching contraction optimization is used to show that people have subconsciously eliminated some decision paths that would not actually be made.

In order to analyze decision behaviors, human performance modeling (HPM) has been researched in the last few decades [3]. HPM tends to demonstrate the interactive relations through designing different inner structures. It evolves from the broad symbolism cybernetic approaches [4] to the new stage of computational rational modeling [5], involving human cognitive behavior at various decision hierarchies. Similar to the non-homogeneous sequential model, decisions are continuously generated in the chronological order. Through depicting the potentially possible distribution caused by differentiation structures, HPM essentially contracts and prunes the immense decision sequence formed by rehearsal.

Similar to the human-like behavior, multistep decision is mutually influenced during the periods rather than the instant moment. The Markov property, strengthening the correlation in decision path, constricts that the selection of action elements is only relevant to the decision adopted at the previous moment. Based on the Markov decision principle, the policy iteration method calculates the one-step reward value by introducing the state of decision object to computation [6]. However, existing physical obstruction makes humans unable to access state parameters without sensor measurement in the interaction environment, which causes unavoidable deviation. To cover this shortage, the partially observable Markov decision method uses interactive object state as an uncertain estimation of observable state set, which also is the main difference from the observable state Markov decision method [7]. Another completely unobservable method named hidden Markov decision analyzes human cognitive behaviors through the state of the interactive object (such as machine). In the aforementioned methods, there exist similar deficiencies on depicting the transition state interval and calculating reward value in a locality. Therefore, the variant intervals and reward value combined by history decision steps and future decision steps are crucial.

On the other hand, adjacent action which lacks mutual influence is a shortcoming for Markov property [8]. It leads to being short of tightness interference in human-like decisions. Different from the single-state inference in Markov process, the semi-Markov decision focuses on the cross-correlation of transition interval, even though the transition state is not ergodic and innumerable. Its state is jumpy and changeable accompanying with the decision process.

In this paper, a semi-Markov jump decision method is proposed to optimize the human cognitive searching decision path through the multistep transition part and mental rehearsal part in a specified airplane pilot interaction scenario. We define a searching contraction index to represent the coverage degree. The coverage degree refers to a ratio between decision behaviors chosen subconscious and all accessible decision behaviors. For a more general situation, the semi-Markov decision process overcomes the restriction by adopting the time-varying transition rate. Thus, the sojourn time between each mode can be of any non-exponential distribution. Besides, the human making decision is different from a fixed step decision controller. The time interval in sequential decision is not a constant and is arbitrary. In addition, it cannot be modeled by noise like exponential distribution which obeys the Markov transition law. We consider the semi-Markov process and human-centered reinforcement Q-learning to realize the estimated decision solution. Depending on the state of inconsistent transition interval, the composition decision step accomplishes making decisions. As the core of decision making, the dynamical transition probability motivates state transition and action adopted. The bootstrap re-sampling frequency can abstract the mental rehearsal process by re-screening the transition probability. Finally, decision boundary influenced by the interactive object constricts the final human decision. Figure 1 briefly shows above compounding relation. To summarize, this paper puts forward the following four contributions:(i)A semi-Markov jump cognitive decision method is proposed to evaluate the dynamical cognitive interaction process. Our method integrates the semi-Markov decision transition interval, the multiple decision path estimation, and the changeable decision solution space for jump state.(ii)The transition interval and sojourn time, which are of vital importance characteristics, have been preferably reflected in our method. By adding mental rehearsal property, our method addresses the reduction of infinite-dimensional decision solution space and forward advances the dimension deduction to a smaller range.(iii)An introduced index named searching contraction can efficiently reflect cognitive computation ability of human while exploring the decision solution space.(iv)Our method incorporates the relation in decision time and depth, conforming to the human being’s logic of deciding and the property of transition state jumpy property.

The rest of the paper is organized as follows. Section 2 briefly describes the related works about multistep transition, mental rehearsal, and dynamical searching dimension in decision. Section 3 and Section 4 emphasize the specific problem and illustrate how our decision method is built for decision making. Section 5 and Section 6 detail the experiments and the integral analysis of this model, including its shortcomings, and the future directions in this area. Section 7 summarizes this paper.

2.1. Multistep Transition

The multistep transition happening in the decision making has been developed with many methods [9], such as reinforcement learning [10], utility selection theory [11], and networked control system [12]. Compared to the continuously accumulated and improved process, the common decision framework is to obtain an optimal decision strategy via the feedback effect [13] and evaluate the potential outcome values caused by events when decision is formulated by the feedback loop design [14]. Focusing on cognitive analysis, Yanco and Drury [15] modified a taxonomy of multiagent systems and treated the human-computer interactions as a process of two heterogeneous agent interactions. Moratz et al. [16] experimented with a comparison test between human-robot and human-human to illustrate the difference in spatial features. The comparative trials implicitly revealed that the complexity of cognitive space plays a prominent role in the interaction process. From the view of multiple timescales, such as cognition and decision, Purcell and Kiani [17] designed a hierarchy of multistep transition decision on processes to disambiguate the detrimental factors such as flawed information. All of the above works mainly focused on the differences between human and robot as the autonomous agent. They considered human as uncertain and non-monotonous agent by adding the stochastic-dynamic transition interval which accords with exponential distribution. Wu et al. [18] added the transition time restriction into the semi-Markov model while its finite system state was limited into the noninteractive modal. To the best of our knowledge, the non-exponential distribution of transition interval which can be used into cognitive decision making has received limited effort so far. Decision interval of Markov jumping provides a more general way to describe the multistep transition for cognitive decision.

2.2. Mental Rehearsal

Mental rehearsal, also known as mental simulation, is one of the cognitive strategies [19]. This strategy takes future action-practice without outer observed physical performance. In the typical task, it is regarded as one of the efficient methods to improve the decision performance of the psychomotor and sport. For example, Miranda et al. [20] used mental rehearsal to decrease depressive predictive certainty, which showed the gains in making optimistic predictions. Ignacio et al. [21] proposed that different health disciplines can utilize mental rehearsal strategy as a part of clinical training. Su et al. [22] designed incremental deep convolutional neural network process to demonstrate the human-like learning behavior. Moreover, researchers analyzed its different effects on the user’s learning decision in the theory of working memory [23]. As a computational model, Oberauer and Lewandowsky [24] designed a time-based resource-sharing theory to derive unambiguous predictions about the effect of rehearsal on memory, which is beneficial for differentiating between varying forms of mental practice. Besides, mental rehearsal can be analyzed by parameterized formation. To demonstrate the advantage of rehearsal, Mazher et al. [25] found that rehearsal was beneficial for memorized long-term learning by discriminating the learning decision states using electroencephalography.

2.3. Dynamical Decision Dimension

Exploration-exploitation related to the dynamical decision dimension is a crucial aspect, especially for searching the feasible solutions [26]. The dimensional optimization method connects with the decision-making property such as the non-Markovian property [27], which is used to describe the cross-influence between different decision states. To reduce the dimension of decision searching, Engel et al. [28] considered the stochastic jumpy interval in human cognitive decision behaviors and handled it with a linearity weighted logic according to monotonically increased time [29]. To get the global optimal solution, the brute-force calculation method is used. But it is easily trapped into the plight to search the space in finite polynomial time, especially under the non-convex issues. Some proposed optimal algorithms such as best proximity points [30] and particle swarm optimization [31] were applied to evade the non-convex difficulties. However, there still exists an enormous gap between human physical simulation and computational simulation like emotions [32]. The state caused by human action is discretely jumpy rather than the inflexible inference from a fixed step to another. In addition, the uncertain human factors enlarge the difficulty covering decision solution space of all accessible scenes [33].

3. Problem Formulation

The I process is able to give people insight into and observe the state information from the interactive machine object. According to the state obtained by observation and the state of the historical decision path, people make new decision in limited period. Although related work provides state of the art in terms of multistep decision, mental rehearsal, and dynamical decision dimension, existing methods cannot optimize the cognitive decision searching from dimension of time and depth. The current methods implicitly contain a flaw that the process of searching decision solution space depends on the partial history information. Limited to the single sample estimation, evaluation of decision also leads to losing the unbiasedness of decision. Additionally, cognitive searching optimization is involved with the human cognitive properties which have not been fully used in historical research, such as jumpy decision interval and multiple decision path estimation. Therefore, the problem in our work is to optimize the decision reward , generate efficient decision path , increase the searching contraction ratio , and stabilize the decision solution space scope under the two human cognitive characteristics. We design a semi-Markov jump decision method using the hierarchical transition probability optimization from the different step lengths of multistep transition and mental rehearsal frequency. First, we design the semi-Markov process and reinforcement Q-learning to form the multistep transition on the basis of history fragment strategies and subjective estimation about future’s expectation feedback. Then, we build the mental rehearsal optimization based on bootstrap re-sampling, which plays an essential role for human’s subconscious simulation and optimizes transition probability. Besides, the decision space boundary ensures that decision admissible is developed.

4. Semi-Markov Jump Decision Method

In this section, we show the semi-Markov jump decision method for cognitive searching optimization. As shown in Figure 2, the interior of method can be divided into two hierarchies. The first hierarchy indicates the targeted decision inference block. During the decision process, this block limits the interaction target domain and illustrates the maximum-minimum reward for the decision process. The second hierarchy determines searching contraction block by estimating transition distribution. Human decision memory is not amnesic instantly once action completed while considering non-Markov property [34]. It implicitly indicates that decision making does not rely on a point but a fragment. Here we use block of semi-Markov process and human-centered reinforcement Q-learning decision maker to estimate the decision state on the limited length fragment, which is capable of jumpy transition interval belonging to non-exponential distribution. The bootstrap re-sampling controller block is designed to explore the optimal transition probability for the human mental rehearsal. Depending on flight phase, decision space boundary block limits decision to admissible scope for decision inference and flight dynamics. Besides, airplane flight simulating block receives the observable airplane state information from space , whereas it handles executable action parameters from the block of decision inference target.

Figure 2

The architecture of semi-Markov jump decision method. The multistep transition and mental rehearsal are represented separately by the decision maker and bootstrap re-sampling controller in the bottom block. It receives the probability and calculates and for searching contraction block. The airplane model block receives different control parameters from human decision making and outputs the observable state . The decision space boundary calculates the admissible control scope for the final decision judgment.

4.1. Targeted Decision Inference on Receding Horizon

The whole cognitive interaction is defined in the I space . Interior of decision method is defined in decision solution space . Also, we define the inner bootstrap space as re-sampling with replaceable space , which is a subset of space . Similar to a sliding surface forcing the system state in semi-Markov jump system [35], the targeted decision inference is addressed on receding horizon. In space , the aimless interaction decision is excluded in the scope of paper. We assume that the I process has preassigned target set where its elements relate to machine (computer) state at each decision . The constrained loss function is given below, which is similar to a filter using energy comprehensive index to get the optimal decision trail under the specified target.wherewhere represents the synthesis reward function decided by decision process; it minimizes the energy consumed by interaction of cognitive and gets a sequence maximizing the computer performance while the decision follows distribution of trajectory density function parameterized by both transition probability and stochastic error factor. states that the process of is capable of semi-Markov factor . Also, both the temporal fragment factor and jumpy factor make contribution and intervention in process. states that mental rehearsal factor contributes to the decision. We need to get the optimal policy under the minimum condition to ascertain the maximum . denotes the maximum value process. relates to the minimum of smaller worst cost function . At each time step , the agent is in state and must choose an action , transitioning it to a new state and yielding a reward . A policy is defined as a probability distribution over state-action pairs, where represents the density of selecting action in state . Upon consequent interactions with the environment, the agent collects a trajectory of state-action pairs. The goal is to determine an optimal policy by this loss function.

Besides, the constrained loss function satisfies two implicitly postulated conditions. The first condition indicates that the step number of decision is limited. It shows that the cognitive interaction exists in a terminal state. Also, the solution space is bounded by the environment tasks. The second assumption explains that the computer or machine state pattern is similar during this trail process, and it assures that the process can be properly classified into several stages.

4.2. Semi-Markov Process and Human-Centered Q-Learning Decision Maker for Multistep Transition

Decision maker is composed of a hybrid semi-Markov process with the forward human-centered Q-learning estimation. To determine parameter in for equation (19), we consider composite decision maker by the semi-Markov process and human-centered Q-learning. The former takes jumpy property and sojourn time of decision interval into consideration by the semi-Markov process, while the latter calculates future predictive estimation in . Figure 3 shows the sketch of this part.

The sampled discrete state trail (specified value ) of state-action pair is . Subscript is the number of state elements in trail. It is hidden left part of from start of decision process. The action is chosen from the action set. We define as , where is the number of admissible control elements in decision . is observable state variable from process and its subscript is dimension for when refers to . Based on the Markov property and jumpy transition probability , the following functions state the semi-Markov relation between adjacent states . Here, is the stochastic process, denotes the jumpy transfer rate from state to adjacent state , is the decision interval following the non-exponential distribution, and denotes a high-order stochastic variable which is small.

In space , the quadruple is the composite of elements in observable semi-Markov discrete process . and is discrete Lebesgue additive on the measure . The state transition is denoted as operation ; then, the observable state is from rigorous time homogeneous continuous Markov process whose tuple form is and . According to , dimension of action set is dynamically changeable under the updating transition probability . For process, the element of action set is the same as and it receives the decision from . When a new decision is determined by , will be transmitted to after the correction . is normal distribution represented by the transition error. Here we assume that does not exhibit parameter drift such as time delay factor. We write the actual action set form and as

The accumulated state and indicator state are calculated from state in the and . Therein, semi-Markov process follows the non-exponential distribution sojourn time but is continuous without sojourn time. We estimate it by the interval information entropy. Here is the weight coefficient of action, and superscript is the observable state dimension in .

Also, the semi-Markov process within human and regular Markov process within machine (computer) happen synchronously, while the former is discrete and the latter is continuous. Therefore, the intervention relations between and can be represented as follows. Here is the error variable that follows normal distribution. separately indicate the continuous and discrete interval period from different processes.

For regular Markov process , the expectation performance is obtained at decision . According to Doeblin lemma [36], let be a transition probability matrix: when , there exists only stationary probability vector in , and for all initial distribution , . This lemma presents that the amnesic initial of distribution exists in . On the other hand, the expectation performance is an accumulated reward about state . It is a continuous additive from the initial of history consideration step to the current decision time, including jumpy transition interval and decision action time that follows normal distribution. Let where is the number of jumps for function up to a time . Then, probability kernel function of the semi-Markov process iswhere . is a measure of intensity of random point field for fixed . Assuming that represent the number of discontinuity pairs belonging to a set , we have expectation .where is history decision determined and . In the multistep decision process, humans anticipate fuzzy assessment before making the decision [37]. Next, we use human-centered reinforcement Q-learning to estimate accumulated feedback and the maximized performance as the transition probability in short future period .

denotes the optimization limited trail for future perdition decision sequences, and subscript is predicted future decision step length. Given a future estimation process , which is right continuous part of , we write performance index as . Bellman optimal theory [38] illustrates that optimal decision sequence can be divided into several blocks staying in optimal state space. It makes sure the sufficiency for division of . According to Bellman optimal theory, we derive to [39].where , is the exploration ratio, and is the reward. To calculate the value function , we have the following derivations. First, we consider

Furthermore, we let to denote and

Through the law of total probability expansion, the operation is substituted by the following expression:

Then, we can get the function equality equation (9). is an indicator function format.

For prediction process , we assume that discrete time sequence is equidistance . According to the Bellman equation and formula of total probability, we have the recurrence accumulated feedback. Also, this predicted reward will be used to determine optimal transition probability on space in the next section.

4.3. Bootstrap Re-Sampling Controller for Mental Rehearsal

Bootstrapping was introduced as a flexible method to estimate the sampling distribution of an independent observation function [40]. It takes distribution from sample data to substitute the global whole data, , and is useful for estimating of uncertainty in subspace identification. Figure 4 shows a segment that describes bootstrap re-sampling training to search the optimal transition probability where the bootstrap re-sampling controller is used to determine the transition possibility.

We have where the subscript stands for the different sample index. To explore the theoretically infinite solution decision space , we assume that its probability distribution function is in accordance with . For each , it stands for a brevity decision sequence according to Q-learning estimation introduced in the next section. We use the limited re-sampling characteristic to represent the global big sampling range.

Here is the state from observed aspect and decision in space ; therein, . The subscript is the discrete step index in decision sequence. The superscript stands for different observed states, such as human state and machine state.

We assume that transition probability discrete distribution dominates the pair transferring in decision sequence. To calculate , we first set initial probability . The capital subscript stands for step length of history consideration decision. The superscript is the account of decision category in that different kinds of decisions are independently identically distributed. Next, each probability element is calculated by

Here is an indicator function.

Furthermore, the searched optimal transition probability group will be determined for searching contraction . In bootstrapped sample for future estimation, reward function , which is assumed monotonically increasing as performance improves, is used to compare the accumulated reward.where is damping factor and is a Q-learning estimation from time . After given the bootstrap sample estimation value, we derive the transition probability distribution and get the decision from the optimal decision trail for the next decision step. Algorithm 1 shows the comprehensive block of searching contraction .

(1)	Initialize , and
(2)	Initialize
(3)	repeat
(4)	Calculate history reward
(5)	Initialize the boots frequency
(6)	repeat
(7)	Calculate predicted reward
(8)	until and
(9)	Calculate
(10)	Calculate
(11)	until Temporal fragment decision completed at
(12)	Calculate
(13)	Obtain the possible decision

4.4. Decision Space Boundary for Admissible Decision

The decision space boundary block in Figure 2 limits the accessible decision action set and relies on the airplane flight dynamic state variables. Based on the observed state , admissible decision action is the subset of action control scope . Meanwhile, allowable state space is a hypercube field , which will inversely limit the decision action generated from the searching contraction method. This self-triggered policy caused by interaction object contributes to jumpy updating state and executing action by relying on the latest sampled state information [41]. In the sequel, we consider the flight longitudinal dynamic system model as follows:

are the various rates of airspeed, vertical speed, attack angle, and pitch angular velocity. is the rotational inertia, is the power of airplane, and is the engine mounting angle. We assume that airspeed is approximately equal to tangential velocity. To get the boundary block, the Hamilton–Jacobi function is needed to be solved as follows:

Therein , and the terminal boundary condition is set as . Then, we can obtain the effective action which stays in the scope of decision space boundary.

At each decision time updated, the bounded observable state will be generated and compared with the boundary . Only until the decision , can be transmitted into airplane simulating block and then the interaction can be completed. Below, we provide Algorithm 2 for the whole decision inference on the receding horizon.

(1)	Initial target set
(2)	repeat
(3)	Calculate the allowable state space
(4)	repeat
(5)	searching contraction block
(6)	ifthen
(7)	output decision
(8)	end if
(9)	until
(10)	until target set is completed

5. Experiment and Results

In this section, we present an experimental case and its results for airplane manipulating scenario. A typical task is to manually control the aircraft to descend altitude by the pilot. Some extra tasks are set on the designated altitude. Those particular subtasks require pilots to execute special operations. We apply our method into this experiment case. Results show that our method reflects the cognitive searching optimization from searching contraction index.

5.1. Experiment Setting

Table 1 shows the flight altitude descending stages from 11000 ft to 2000 ft in experiment. Stages 2, 4, and 6 require the pilot to complete the specific tasks on specified altitude scope, and stages 1, 3, and 5 are the normal descending procedures involving basic flight joystick and throttle control [42]. Table 2 lists the multiresource channels involved in the experiment. We calculate the situation channels occupied with equipment to assess the workload taken by action. Table 3 lists the correlation between control rules and related equipment. Table 4 points out that delay in decision process separately represents state interval time delay and action time delay. We use the Poisson distribution and normal distribution to represent those two types of delay. The Poisson distribution is a non-exponentiation distribution satisfying the semi-Markov property in the context of continuous time domain. and stand for the mean value and variance value, respectively [44, 45].

In experiments, we use the control rule to substitute the action in traditional decision action set. Each control rule corresponds to specified control equipment, which is chosen depending on the interactive process . The number of actions can be one or more depending on observed states. The parameter mean in normal distribution of action time delay relates to the different equipment. Here we adopt parameters from the NASA timeline analysis report [46].

5.2. Simulation Results

In order to assess the pilot’s cognitive decision ability, flight performance, human accumulated workload, and the number of manipulations are three indexes measuring our multistep decision method. Trends of dynamical dimension in show the searching contraction results from infinity to limited. We use inner batch frequency to simulate the mental rehearsal in which the pilot chooses the suitable action from anticipating. Here inner batch frequency refers to rehearse times. By varying SMDP-step (semi-Markov decision property) and Q-step tuple, we simulate the time scale of history state influence and future prediction impact. Table 5 lists the main experiment results, and it shows the detailed contrast data in different combinations of parameters. Figure 5 shows flight height performance. Figure 6 depicts the workload accumulated speed increased with working time under different parameter configurations. Figure 7 shows the value of searching contraction ratio.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

In Table 5, time of flight task, human accumulated workload, and steps of manipulation are three indexes to build the overall evaluation assessing the decision methods. The accumulated workload caused increases slightly when the parameter inner batch frequency is boosted. This result is consistent with the fact that human mental workload increased with cognitive time pressure [47]. The bigger batch frequency is, the more the time pressure is. It is worth noting that our experiment setting batch 20 is an extreme situation, exceeding the ordinary [48]. Regarding the human cognitive, their capability enlarges as batch frequency increases. The compound step tuple is another critical parameter. By setting history consideration steps (SMDP-step) and future estimation steps (Q-step), we compose the different multistep decision methods. For example, when the SMDP-step and Q-step are equal to 1, the method is essentially a Markov decision process (MDP).

Figure 5 intuitively reflects airplane flight height effect. On the whole, descending trajectories show the optimal stationary distribution at batch 5, where curve differences are less. The differences between different trajectories, decided by steps tuple, are significantly increased. This result illustrates that searching contraction is relevant to human mind rehearsing action. The bigger the rehearsing frequency, the more the difference caused by different multistep decision methods. From the view of trajectory smoothness, the descending trend is similar to flight stage 1. But the accumulated effect caused by different multistep decision methods starts to appear from stage 2. In Figure 5(a), fluctuating range of S8Q3 is flatter compared with others. In Figure 5(b), fluctuating range of S3Q8 is flatter compared with others. In Figure 5(c), fluctuating range of MDP is flatter compared with others. In Figure 5(d), fluctuating range of S8Q3 is flatter compared with others. Corresponding to the rough scope of higher local value in each subfigure, flight stages 2, 4, and 6 which cover more control tasks show the hysteresis effect in descending trend curves. Also, it can be found that all multistep decision methods in experiments can converge airplane state to the target position.

Figure 6 shows a composite reward calculated by airplane flight performance and human cognitive workload performance. Reward value is more uniformly distributed when batch frequency is bigger. When human makes decisions after the repeated estimations, reward caused by manipulation tends to be similar. But higher repeated estimation times bring higher reward value, meaning that excessive anticipation leads to excessive cognitive workload. The reward accumulated speed increases with the batch frequency based on the slope of curves. Aside from Figure 6(d), the green line (S8Q3) and blue line (S4Q3) state the superior result from the horizontal axis (time, less is better) and vertical axis (reward value, less is better). The gray line (SMDP) at batch 5 takes the best effect, which means that the multiple estimations also take effects on future estimation. The deepskyblue line (MDP) at batch 10 takes the best effect, while the yellow line (S3Q4), green line (S8Q3), and the blue line (S4Q3) present proximate effect.

Figure 7 shows dimension variation percentage of searching contraction ratio. We calculate the ratio by the accumulated searching result of history decision space and predetermined searching scope at the parameter tuple (, , batch or mental rehearsal frequency). The value of is smaller, and the searching contraction ratio is higher. Equation (23) calculates , where refers to the number of rules and is an indicator function.

In this way, the initial infinity searching dimension is related to the history step influence, future step estimation, and bootstrapping frequency. The probability determined by the mental rehearsal and multistep transition, which is influenced by the dimension of decision, reflects the contraction effect of the search dimension. As shown in the results, cognitive searching optimization process shows a downward trend overall. When the batch frequency increases, the stability of the searching dimension contraction ratio gradually improves. Also, the ratio is distinguished according to different types of combined decision steps. For example, when future estimation step equals 1, such as MDP (S1Q1) and SMDP (S3Q1), the longitudinal change amplitude of dimension percentage changes intuitively from big to small within the batch frequency but independent from other methods.

5.3. Cost and Performance Analysis

Figure 8 shows the changes in various indicators and three primary parameters (two types of decision steps, rehearsal frequency) in our decision-making method. Under different batch parameters, Figure 8(a) shows the variation of statistical standard deviation for each cluster’s flight descent curve as the mission progresses. When the batch number is larger, the standard deviation firstly climbs up and then declines. Figure 8(b) shows that the average accumulated workload increases with the batch frequency. It proves that the more the decisions people anticipate, the greater the workload caused by mental rehearsal. An inflection point exists in time index when batch 5 shows that suitable mental rehearsal can decrease work time. Figure 8(c) shows that manipulation step drops down with batch increases, but its decline trend slows down. When the batch frequency is less (e.g., batch 1), the manipulation step is more larger. The negative correlation can illustrate that non-optimal strategy step leads to generate more decision steps to revise the former. Figures 8(d) and 8(e) show the influence of different types of decision step. There is a peak in the histogram group under all of the different indicators showing that the appropriate number of decision steps can reduce the corresponding indicator’s performance. Instead, the inappropriate number of decision step will increase the index value. Figure 9 analyzes the reward value presented in Figure 6. The combined reward index defined by flight performance and workload shows that batch 5 is the peak of these data, which shows that although the workload will increase, the overall value of batches 10 and 20 will decrease under the influence of the mission. Therefore, considering the three types of index data and reward values, the batch frequency between 1 and 5 is more appropriate.

(a)

(b)

(c)

(d)

(e)

Figure 8

Analysis of flight altitude decline performance and reward index value. To compare the effects of different batches (mental rehearsal parameters), we calculate the overall mean values of flight height performance and reward indexes (time, accumulated workload, and decision manipulation steps) under different multistep transfer parameters. To compare the effect of multistep transition effect (the number of steps in historical consideration and the number of steps in future estimation) on the reward index, we calculate the overall mean value under different batch parameters. (a) Standard deviation of airplane performance for different batch frequencies. (b) Average cost index for different batch frequencies. (c) Average steps of manipulation for different batch frequencies. (d) Average steps of manipulation for different batch frequencies. (e) Average steps of manipulation for different batch frequencies.

(a)

(b)

5.4. Transition Probability and Searching Contraction Ratio Analysis

Figure 10 shows transition probability distribution varying in different types of methods. Transition probability, which is calculated from inner simulated estimation, reflects the dynamical selection from rules. The changing trend is more consistent with normal human decision-making behavior because inferring transition probability lies at the core of human sequence knowledge [49]. It demonstrates that cognitive interaction behavior constantly attempts to infer the time-varying matrix of transition probabilities when it receives the outer observed machine states. Therefore, dynamical transition probabilities are ensured by the bootstrap re-sampling controller in searching contraction method.

(a)

(b)

Additionally, Figure 10 shows transition probability with regard to control rules in Table 3 and parameters in Table 5. Transition probability is dynamically changeable during the flight manipulation stage. The transition probability value of pitch control rule (Rule 1) is higher than that of other rules on average. The transition probability of vertical-speed control rule (Rule 2) is less focused than the height control rule (Rule 3). Configuration control rule (Rule 4) is not used until the flight altitude attains the allowable range. At different flight stages, rule transition probability is verified by the specific tasks and its corresponding control rules. On the other hand, transition probability is prominently influenced by step tuple. The overall fluctuation of transition probabilities varies strengthening accompanied by the increase of estimated part in the decision step tuple. Fluctuation of probability variation in MDP and Q-learning methods is less than others.

On the other hand, searching contraction change happens in the continuously multistep decision I process. It denotes the damping of decision admissible exploitation dimension. The solution space in which human chooses strategy rules contracts within the receding horizon controller , after completing the inner inference by . Research about the brain prefontal also demonstrated this point that the existing high-level control area reprocesses the upcoming decision before it finally enters awareness [2]. Therefore, under the influence of interactive environment, the degree of searching contraction is a critical factor in cognitive decision.

We compare the searching contraction ratio from different aspects in Figure 11. First, stability of the dynamical dimension goes down as the batch frequency gradually goes up. Figure 11(a) compares the search contraction ratio in terms of the composition of multistep transfer steps and the frequency of mental previews. By calculating the average change of the overall contraction ratio under the specified batch parameters, the figure shows that under higher batch frequency situation, the contraction ratio fluctuation decreases when the contraction percentage increases. The intersection between the batch parameters 1–5 will provide better overall performance. The S4Q3 example in Figure 11(b) emphasizes that as the batch value increases, the amplitude of the contraction percentage fluctuation will decrease, but when the number of batches is greater than 5, the standard deviation of the fluctuation amplitude tends to be flat. It demonstrates that the increase in batch frequency at this time has little effect on the increase of the fluctuation amplitude. Figures 11(c) and 11(d) compare the number’s increase of history consideration steps and future estimated step groups. The result in this figure states that during the I process, the contraction ratio caused by the history consideration steps and future estimated step in the experiment decreases and increases, respectively, with the number of steps increasing. But, they both indicate that too many steps will suppress the improvement of contraction ratio (e.g., the number of steps equals 8). And with the increase in the batch frequency, increment in the contraction ratio will gradually slow down. This result shows the fact that searching contraction is difficult for human brain under the situation of the excessive rehearsal numbers, such as batch 20. The above data analysis explains the feasibility of searching contraction decision method proposed in this paper.

(a)

(b)

(c)

(d)

Figure 11

Sample comparison from four aspects. (a) Mean value comparison in different batch frequencies. (b) S4Q3 sample comparison in different batch frequencies. (c) History transition steps influence on searching contraction ratio. (d) Future estimation steps influence on searching contraction ratio. Fitted variation curves of standard deviation and mean value are also plotted. In (c) and (d), two variation trends of search contraction are plotted based on the different parameter values of the decision step number.

6. Discussion

Our experimental results illustrate that cognitive search contraction is a subconscious phenomenon that commonly occurs in the decision-making process. Therein, the associated multistep transition and mental rehearsal are two crucial factors. The multistep transition factor, which combines the influence of cumulative fragments and the jumpy transition interval, is the basis of interactive decision making. Under the credible admissible decision-making boundary, the mental rehearsal that covers the parallel execution of the decision fragments will screen the decision again until obtaining the optimal decision-making behavior.

The analysis of experimental results shows that different combinations of multistep transition step values and different rehearsal frequencies will affect the search contraction ratio and decision-making rewards. On the one hand, the number of historically considering decision steps will reduce the workload of brain (by increasing the searching contraction ratio). In contrast, the increase in the number of future estimated steps will increase the workload of brain (by decreasing the contraction ratio of searching). At the same time, too much rehearsal frequency makes the contraction efficiency of decision search in decision solution space decrease. This is consistent with the research result that cognitive channel is limited when human completes multitask [50].

Meanwhile, the increase in the rehearsal frequency will reduce the fluctuation of search contraction index. It shows that the if a person can get more information or experience before decision, the bias of result will lower. On the other hand, the decision reward of all different multistep transition step types presents a trend of increasing first and then decreasing. This reflects that the number of decision-making steps are not positively correlated with the decision reward. Moderate composition of multistep transition step can bring the optimal decision reward (interval time, workload, and number of decision steps). It is the same as the rehearsal frequency.

Here we define the time complexity as the sum of the worst-case running time for each operation (e.g., multiplication, division, and addition) required to process an output. The growth rate is then obtained by making the parameters of the worst-case complexity tend to infinity. Memory complexity is estimated as the number of 32 bit registers needed during the learning process to store variables. And also, only the required worst-case memory space is considered during the process phase. From the following complexity formulations, we can find that the time complexity grows quadratically with and linearly with , and its memory complexity of the algorithm grows linearly with and . When is very large, the memory complexity will not exceed the resources available for the training process, avoiding overflow from internal system memory to disk storage.(i)Time complexity: for a decision sequence with length and ergodic Markov state , the time complexity is composed of decision iteration process. We can get the time complexity as(iii)Memory complexity: the memory complexity relates to the decision space. In this paper, the decision space is composed of decision horizon space and bootstrap sampling space.

Research studies about decision tree considered the different algorithms to optimize the decision searching tree, such as the UCB1, UCT, and other non-greedy methods. However, the main index searching contraction ratio designed in our paper is not similar to those studies, where they verify their efficiency through the true/false ratio. On the other hand, the method in our paper used the Hamilton function to limit the admissible decision action while the other existing studies analyzed the decision state as a discrete classification problem. In Table 5, when parameters B, S, and Q are different, our proposed method can be transformed into other existing methods, such as the standard Markov decision method (B1S1Q1), standard semi-Markov decision method (B1S3Q1), and standard Q-learning method (B1S1Q3). In Table 6, we add the comparison at the index searching contraction from its mean value and variation, and the index time is also listed. From the table, it can be shown that the stability and time efficiency of our proposed method are better than those of the previous studies.

7. Conclusion

In this paper, we propose a semi-Markov jump decision method to optimize the decision path and a searching contraction index to indicate cognitive searching optimization. The main difference of our work is using jumpy decision interval and multiple path estimation as deterministic features. Under the specific interaction target in decision boundary, this modification leads to optimizing cognitive interaction decision from the time dimension and depth dimension. Semi-Markov jump transition decision outperforms the traditional Markov method by strengthening the correlation from the dimension of decision time interval. The mental rehearsal improves the searching depth of decision solution space. The decision boundary filters out the infeasible human decision by the estimated admissible action boundary. Furthermore, numerical simulation shows the characteristic of searching contraction, and our decision method can be applied to evaluate a class of multiple element types’ decision path. The reduction in searching contraction ratio proves that proper transition step length and mental rehearsal frequency can reduce and stabilize the searching space and reward of decision path in the I process. The future work will address the decision switch relation happening in the semi-Markov cognitive decision. To investigate the human fatigue influence on control accuracy and stationary, we will research the jumpy switch control according to the limited human behaviour rule. And the arbitrary number of historical decision steps in the decision-making is also deserved to be explored.

Data Availability

The numerical simulation data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

C. M. Wu, E. Schulz, M. Speekenbrink, J. D. Nelson, and B. Meder, “Generalization guides human exploration in vast decision spaces,” Nature human behaviour, vol. 2, no. 12, pp. 915–924, 2018.
View at: Publisher Site | Google Scholar
C. S. Soon, M. Brass, H.-J. Heinze, and J.-D. Haynes, “Unconscious determinants of free decisions in the human brain,” Nature Neuroscience, vol. 11, no. 5, pp. 543–545, 2008.
View at: Publisher Site | Google Scholar
H. G. Stassen, G. Johannsen, and N. Moray, “Internal representation, internal model, human performance model and mental workload,” Automatica, vol. 26, no. 4, pp. 811–820, 1990.
View at: Publisher Site | Google Scholar
J. R. Larson Jr., “The performance feedback process: a preliminary model,” Organizational Behavior and Human Performance, vol. 33, no. 1, pp. 42–76, 1984.
View at: Publisher Site | Google Scholar
T. L. Griffiths, F. Lieder, and N. D. Goodman, “Rational use of cognitive resources: levels of analysis between the computational and the algorithmic,” Topics in cognitive science, vol. 7, no. 2, pp. 217–229, 2015.
View at: Publisher Site | Google Scholar
H. S. Chang, H.-G. Lee, M. C. Fu, and S. I. Marcus, “Evolutionary policy iteration for solving markov decision processes,” IEEE Transactions on Automatic Control, vol. 50, no. 11, pp. 1804–1808, 2005.
View at: Publisher Site | Google Scholar
T. Jaakkola, S. P. Singh, and M. I. Jordan, “Reinforcement learning algorithm for partially observable markov decision problems,” in Advances in Neural Information Processing Systems, pp. 345–352, Springer, Berlin, Germany, 1995.
View at: Google Scholar
J. R. Busemeyer and T. J. Pleskac, “Theoretical tools for understanding and aiding dynamic decision making,” Journal of Mathematical Psychology, vol. 53, no. 3, pp. 126–138, 2009.
View at: Publisher Site | Google Scholar
C. Buc Calderon, M. Dewulf, W. Gevers, and T. Verguts, “Continuous track paths reveal additive evidence integration in multistep decision making,” Proceedings of the National Academy of Sciences, vol. 114, no. 40, pp. 10 618–10623, 2017.
View at: Publisher Site | Google Scholar
M. van Otterlo and M. Wiering, “Reinforcement learning and markov decision processes,” in Reinforcement Learning, pp. 3–42, Springer, Berlin, Germany, 2012.
View at: Publisher Site | Google Scholar
Z. Xie and Y. Jin, “An extended reinforcement learning framework to model cognitive development with enactive pattern representation,” IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 3, pp. 738–750, 2018.
View at: Publisher Site | Google Scholar
J.-S. Song and X.-H. Chang, “H controller design of networked control systems with a new quantization structure,” Applied Mathematics and Computation, vol. 376, Article ID 125070, 2020.
View at: Publisher Site | Google Scholar
P. Dayan and N. D. Daw, “Decision theory, reinforcement learning, and the brain,” Cognitive, Affective, & Behavioral Neuroscience, vol. 8, no. 4, pp. 429–453, 2008.
View at: Publisher Site | Google Scholar
M. Lebreton, R. Abitbol, J. Daunizeau, and M. Pessiglione, “Automatic integration of confidence in the brain valuation signal,” Nature Neuroscience, vol. 18, pp. 1159–1167, 2015.
View at: Publisher Site | Google Scholar
H. A. Yanco and J. Drury, “Classifying human-robot interaction: an updated taxonomy,” in Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol. 3, pp. 2841–2846, IEEE, Delft, The Netherlands, 2004.
View at: Google Scholar
R. Moratz, K. Fischer, and T. Tenbrink, “Cognitive modeling of spatial reference for human-robot interaction,” International Journal on Artificial Intelligence Tools, vol. 10, no. 4, pp. 589–611, 2001.
View at: Publisher Site | Google Scholar
B. A. Purcell and R. Kiani, “Hierarchical decision processes that operate over distinct timescales underlie choice and changes in strategy,” Proceedings of the National Academy of Sciences, vol. 113, no. 31, pp. E4531–E4540, 2016.
View at: Publisher Site | Google Scholar
B. Wu, L. Cui, and C. Fang, “Reliability analysis of semi-markov systems with restriction on transition times,” Reliability Engineering & System Safety, vol. 190, Article ID 106516, 2019.
View at: Publisher Site | Google Scholar
L. Jones and G. Stuth, “The uses of mental imagery in athletics: an overview,” Applied and Preventive Psychology, vol. 6, no. 2, pp. 101–115, 1997.
View at: Publisher Site | Google Scholar
R. Miranda, M. Weierich, V. Khait, J. Jurska, and S. M. Andersen, “Induced optimism as mental rehearsal to decrease depressive predictive certainty,” Behaviour Research and Therapy, vol. 90, pp. 1–8, 2017.
View at: Publisher Site | Google Scholar
J. Ignacio, A. Scherpbier, D. Dolmans, J. J. Rethans, and S. Y. Liaw, “Mental rehearsal strategy for stress management and performance in simulations,” Clinical Simulation in Nursing, vol. 13, no. 7, pp. 295–302, 2017.
View at: Publisher Site | Google Scholar
H. Su, W. Qi, Y. Hu, H. R. Karimi, G. Ferrigno, and E. De Momi, “An incremental learning framework for human-like redundancy optimization of anthropomorphic manipulators,” IEEE Transactions on Industrial Informatics, 2020.
View at: Publisher Site | Google Scholar
K. Oberauer, “Is rehearsal an effective maintenance strategy for working memory?” Trends in Cognitive Sciences, vol. 23, no. 9, 2019.
View at: Publisher Site | Google Scholar
K. Oberauer and S. Lewandowsky, “Modeling working memory: a computational implementation of the time-based resource-sharing theory,” Psychonomic Bulletin & Review, vol. 18, no. 1, pp. 10–45, 2011.
View at: Publisher Site | Google Scholar
M. Mazher, A. A. Aziz, and A. S. Malik, “Evaluation of rehearsal effects of multimedia content based on EEG using machine learning algorithms,” in Proceedings of the 2016 6th International Conference on Intelligent and Advanced Systems (ICIAS), pp. 1–6, Kuala Lumpur, Malaysia, 2016.
View at: Publisher Site | Google Scholar
T. T. Hills, P. M. Todd, D. Lazer, A. D. Redish, I. D. Couzin, and C. S. R. Group, “Exploration versus exploitation in space, mind, and society,” Trends in Cognitive Sciences, vol. 19, no. 1, pp. 46–54, 2015.
View at: Publisher Site | Google Scholar
X. Wang and H. Wang, “Evolutionary optimization with markov random field prior,” IEEE Transactions on Evolutionary Computation, vol. 8, no. 6, pp. 567–579, 2004.
View at: Publisher Site | Google Scholar
A. Engel, M. Burke, K. Fiehler, S. Bien, and F. Rösler, “What activates the human mirror neuron system during observation of artificial movements: bottom-up visual features or top-down intentions?” Neuropsychologia, vol. 46, no. 7, pp. 2033–2042, 2008.
View at: Publisher Site | Google Scholar
R. A. Bjork and T. D. Wickens, “Memory, metamemory, and conditional statistics,” Behavioral and Brain Sciences, vol. 19, no. 2, pp. 193-194, 1996.
View at: Publisher Site | Google Scholar
S. S. Basha, “Best proximity points: global optimal approximate solutions,” Journal of Global Optimization, vol. 49, no. 1, pp. 15–21, 2011.
View at: Publisher Site | Google Scholar
R. Poli, J. Kennedy, and T. Blackwell, “Particle swarm optimization,” Swarm Intelligence, vol. 1, no. 1, pp. 33–57, 2007.
View at: Publisher Site | Google Scholar
D. Jo, J. Han, K. Chung, and S. Lee, “Empathy between human and robot?” in Proceedings of the 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 151-152, IEEE, Tokyo, Japan, 2013.
View at: Google Scholar
N. A. Atiya, I. Rañó, G. Prasad, and K. Wong-Lin, “A neural circuit model of decision uncertainty and change-of-mind,” Nature Communications, vol. 10, no. 1, p. 2287, 2019.
View at: Publisher Site | Google Scholar
R. D. Morey, “A Bayesian hierarchical model for the measurement of working memory capacity,” Journal of Mathematical Psychology, vol. 55, no. 1, pp. 8–24, 2011.
View at: Publisher Site | Google Scholar
B. Jiang and H. R. Karimi, “Sliding mode control of semi-markovian jump systems,” Sliding Mode Control of Semi-Markovian Jump Systems, pp. 87–113, 2021.
View at: Publisher Site | Google Scholar
K. L. Chung, “The general theory of markov processes according to doeblin,” Probability Theory and Related Fields, vol. 2, no. 3, pp. 230–254, 1964.
View at: Publisher Site | Google Scholar
T. Yamashita, “On a support system for human decision making by the combination of fuzzy reasoning and fuzzy structural modeling,” Fuzzy Sets and Systems, vol. 87, no. 3, pp. 257–263, 1997.
View at: Publisher Site | Google Scholar
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Hoboken, NJ, USA, 2014.
P.-L. Lions and J.-L. Menaldi, ““Optimal control of stochastic integrals and Hamilton–Jacobi–bellman equations. i,” SIAM Journal on Control and Optimization, vol. 20, no. 1, pp. 58–81, 1982.
View at: Publisher Site | Google Scholar
B. Efron, “Bootstrap methods: another look at the jackknife,” in Breakthroughs in Statistics, pp. 569–593, Springer, Berlin, Germany, 1992.
View at: Publisher Site | Google Scholar
H. Wan, X. Luan, H. R. Karimi, and F. Liu, “A resource-aware sliding mode control approach for markov jump systems,” ISA Transactions, 2020.
View at: Publisher Site | Google Scholar
B. Tate, “Boeing 747 training developments and implementation,” Tech. Rep., SAE, Warrendale, PA, USA, 1971, Technical Paper 710473.
View at: Google Scholar
B. Ren, T. Yin, and S. Fu, “An approach analyzing cognitive process of human-machine interaction based on extended markov decision process,” in Proceedings of the 2019 Chinese Automation Congress (CAC), pp. 1306–1311, IEEE, Hangzhou, China, 2019.
View at: Publisher Site | Google Scholar
E. L. Wiener and D. C. Nagel, Human Factors in Aviation, Gulf Professional Publishing, Oxford, UK, 1988.
D. E. Maurino, J. Reason, N. Johnston, and R. B. Lee, Beyond Aviation Human Factors: Safety in High Technology Systems, CRC Press, Boca Raton, FL, USA, 2017.
K. Miller, “Timeline analysis program (tla-1), final report, boeing document D6-42377-5, prepared for National Aeronautics and Space Administration, Langley Research Center (NASA-CR-144942),” 1976.
View at: Google Scholar
E. Galy, M. Cariou, and C. Mélan, “What is the relationship between mental workload factors and cognitive load types?” International Journal of Psychophysiology, vol. 83, no. 3, pp. 269–275, 2012.
View at: Publisher Site | Google Scholar
A. Fink and A. Neubauer, “Individual differences in time estimation related to cognitive ability, speed of information processing and working memory,” Intelligence, vol. 33, no. 1, pp. 5–26, 2005.
View at: Publisher Site | Google Scholar
F. Meyniel, M. Maheu, and S. Dehaene, “Human inferences about sequences: a minimal transition probability model,” PLoS Computational Biology, vol. 12, no. 12, 2016.
View at: Publisher Site | Google Scholar
S. E. Petersen and M. I. Posner, “The attention system of the human brain: 20 years after,” Annual Review of Neuroscience, vol. 35, pp. 73–89, 2012.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Bingxuan Ren et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

323

Downloads

751

Citations

Complexity

Sliding Mode Control of Autonomous Systems and their Applications in Complex Environments

On Cognitive Searching Optimization in Semi-Markov Jump Decision Using Multistep Transition and Mental Rehearsal

Abstract

1. Introduction

2. Related Work

2.1. Multistep Transition

2.2. Mental Rehearsal

2.3. Dynamical Decision Dimension

3. Problem Formulation

4. Semi-Markov Jump Decision Method

4.1. Targeted Decision Inference on Receding Horizon

4.2. Semi-Markov Process and Human-Centered Q-Learning Decision Maker for Multistep Transition

4.3. Bootstrap Re-Sampling Controller for Mental Rehearsal

4.4. Decision Space Boundary for Admissible Decision

5. Experiment and Results

5.1. Experiment Setting

5.2. Simulation Results

5.3. Cost and Performance Analysis

5.4. Transition Probability and Searching Contraction Ratio Analysis

6. Discussion

7. Conclusion

Data Availability

Conflicts of Interest

References

Copyright