Research Article  Open Access
Nitish Das, P. Aruna Priya, "FPGA Implementation of Reconfigurable Finite State Machine with Input Multiplexing Architecture Using Hungarian Method", International Journal of Reconfigurable Computing, vol. 2018, Article ID 6831901, 15 pages, 2018. https://doi.org/10.1155/2018/6831901
FPGA Implementation of Reconfigurable Finite State Machine with Input Multiplexing Architecture Using Hungarian Method
Abstract
The mathematical model for designing a complex digital system is a finite state machine (FSM). Applications such as digital signal processing (DSP) and builtin selftest (BIST) require specific operations to be performed only in the particular instances. Hence, the optimal synthesis of such systems requires a reconfigurable FSM. The objective of this paper is to create a framework for a reconfigurable FSM with input multiplexing and statebased input selection (Reconfigurable FSMIMS) architecture. The Reconfigurable FSMIMS architecture is constructed by combining the conventional FSMIMS architecture and an optimized multiplexer bank (which defines the mode of operation). For this, the descriptions of a set of FSMs are taken for a particular application. The problem of obtaining the required optimized multiplexer bank is transformed into a weighted bipartite graph matching problem where the objective is to iteratively match the description of FSMs in the set with minimal cost. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. The experimental results from MCNC FSM benchmarks demonstrate a significant speed improvement by 30.43% as compared with variationbased reconfigurable multiplexer bank (VRMUX) and by 9.14% in comparison with combinationbased reconfigurable multiplexer bank (CRMUX) during field programmable gate array (FPGA) implementation.
1. Introduction
Designing a complex digital system requires an efficient method that includes modeling a control unit (i.e., a controller). The operational speed of such systems depends on the speed of their controllers. The mathematical model for designing a controller for applications such as microprocessor control units, circuit testing, and digital signal processing (DSP) is a finite state machine (FSM). Consequently, designing such systems requires an efficient synthesis technique for highspeed FSM [1, 2]. Applications such as DSP [3, 4] and builtin selftest (BIST) [5] require specific operations to be performed only in the particular instances. Different control units are required to complete each operation. Hence, to optimally perform these operations, a single control unit is defined which can configure itself depending upon the applied mode of operation; it is also known as reconfigurable FSM [1]. The mode of operation for such FSM is controlled by a counter, timer, or any userdefined control signals based on the application requirements. An example of a reconfigurable FSM is given in [1] as a test chip for wireless sensor network. In this example, TransitionBased Reconfigurable FSM (TRFSM) [1] is configured into one of the MCNC FSM benchmark circuits (i.e., dk15, s386, or cse) at different instances. Moreover, any application which requires sequential processing can be broken down into a series of instances (i.e., multistage reconfigurable signal processing) where at each instance only a particular operation is performed [3]. Hence, for such applications, efficient architectures can be created using reconfigurable FSM. These emerging trends in the research necessitate a framework for optimal synthesis of highspeed reconfigurable FSM.
Conventional LUTbased architectures have been used for FSM implementation on a FPGA platform [6]. Similarly, ROMbased architectures are investigated for FSM implementations. Due to the area and speed advantages, they act as an excellent alternative to their conventional LUTbased counterparts [7]. In such implementations, a considerable reduction in power consumption is obtained by disabling embedded memory blocks (EMBs) during the idle states [8, 9]. The fundamental framework for FSM with input multiplexing (FSMIM) is made in [7] whose prime objective is to shorten the depth of ROM memory. In their approach, an input selector (which consists of a multiplexer bank) is used. The basic idea that has been implemented is to select only a specific set of inputs for a particular state. FSMIM with statebased input selection (FSMIMS) is proposed in [10], which further reduces the ROM memory size.
Another approach for implementation of reconfigurable FSM is RAMbased architectures. In literature, there are two underlying RAMbased architectures, that is, variationbased reconfigurable multiplexer bank (VRMUX) and combinationbased reconfigurable multiplexer bank (CRMUX) [11]. The RAMbased architectures do not serve as a novel tool for implementation of complicated FSM structures such as parallel hierarchical finite state machines (PHFSM) [12] or reversible FSM [13]. Due to significant advantages of FSMIMS architecture over other architectures, it is used to create a framework for the highspeed Reconfigurable FSMIMS architecture.
The Reconfigurable FSMIMS architecture is constructed by combining the conventional FSMIMS architecture [10] and an optimized multiplexer bank (which defines the mode of operation). For this, the descriptions of a set of FSMs are taken for a particular application. Hence, the problem is to obtain the optimized multiplexer bank for the given set of FSMs. It can be solved by mapping all the FSMs into one large FSM (called base_ckt) in that set. The objective of this process is to perform optimal matching between base_ckt and the other FSMs in the set so that a minimum number of bits are changed by changing the mode of operation. This situation (i.e., performing onetoone mapping) transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set to base_ckt with minimal cost [14]. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. In this algorithm, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix. A cost matrix reduction based technique, that is, Hungarian algorithm [15, 16], is used for matching. A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search. At every iteration, descriptions of two FSMs (i.e., base_ckt and one of the FSMs in the set) are taken as inputs. It produces the modified descriptions of the FSMs of the same dimension as outputs. At the end of the algorithm, a mutual XOR operation is performed among the modified descriptions, which provides the required optimized multiplexer bank.
The experimental results from MCNC FSM benchmarks illustrate the advantages of the proposed architecture as compared with VRMUX [11], as operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation. It also shows that the operating speed is improved at an average of 9.14% in comparison with CRMUX [11] during FPGA implementation. The limitation of the proposed technique is the requirement of higher LUTs, as it requires an average of 88.65% more LUTs in comparison with CRMUX [11] during FPGA implementation.
The rest of the paper is outlined as follows. Section 2 consists of the Reconfigurable FSMIMS architecture and the proposed iterative greedy heuristic based Hungarian algorithm. The experimental evaluation of the proposed algorithm, implementation of the Reconfigurable FSMIMS architecture, and comparison with other proposals from the literature are presented in Section 3. The concluding remarks are devised in Section 4.
2. Proposed Method
As most of the FPGA platforms use synchronous EMBs, Mealy machines with synchronous outputs are used in this paper. Let a Mealy FSM be described by the following columns: is a code of current state (, where is a set of states); is a code of state ; is the number of transitions per state (, where is a set of number of transitions per state corresponding to ); is a state of transition (the next state); is a code of state ; is the set of input variables, is the set of output variables; and is defined as excitation functions for the flipflops, where is the number of flipflops (i.e., the number of bits in internal state codes), .
The descriptions of a set of FSMs are taken for a particular application. The fundamental idea is to obtain the description of a single FSM by mapping all the FSMs into one large FSM (called base_ckt) in that set. The inputs, states, and outputs of an FSM in the set are mapped into base_ckt in their respective order. The mode bits are applied through a 2 × 1 multiplexer in those positions where the polarity of bit differs (i.e., 1 in place 0 and vice versa) to perform such mapping. Hence, the resultant FSM operates in two modes, where base_ckt mode is the default mode of operation. Similarly, all other FSMs in the set are mapped into base_ckt. In this way, a single FSM (i.e., base_ckt) combined with a multiplexer bank (which defines the mode of operation) acts as reconfigurable FSM. It can be configured into a particular FSM in the set by applying the specific mode bits. Due to numerous advantages mentioned in the literature, FSMIMS architecture [10] is chosen to implement the FSM (i.e., base_ckt) part. Therefore, the Reconfigurable FSMIMS architecture is constructed by combining the conventional FSMIMS architecture [10] and multiplexer bank for mode based reconfiguration as shown in Figure 1.
It encounters the following two major difficulties:(i)The complexity of the resultant multiplexer bank is very high.(ii)It becomes difficult to define the dummy states and dummy transitions. Dummy states and dummy transitions are such states and transitions which are not present in base_ckt but exist in the other FSMs in the set and vice versa. These states and transitions lead the system to failure.
As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. In this algorithm, the descriptions of a set of FSMs (i.e., []) are taken as inputs. It provides the optimized multiplexer bank for mode based reconfiguration as output. It also provides the updated description (i.e., description without dummy states and dummy transitions) of base_ckt, which is used to construct the conventional FSMIMS part of the proposed architecture. Let () be the set of FSMs for a particular application. Based on the complexity of the description of FSM, the largest FSM is selected from the set. It is called base_ckt. The rest of the FSMs are called , respectively.
Each input, state, or output of a can be mapped into any one of the inputs, states, or outputs, respectively, of base_ckt; that is, there exists a onetoone mapping. These mappings cannot be performed independently because inputs, states, and outputs of an FSM are interdependent. Consequently, mapping an input or state of recon_ckt_b into base_ckt is transformed into a weighted bipartite graph matching problem or linear assignment problem (LAP) [14] as shown in Figure 2. In this LAP, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix. The objective of this process is to perform matching with a minimal cost so that a minimum number of bits are changed by changing the mode of operation. Therefore, the complexity of the multiplexer bank is reduced.
In the literature, the following approaches are proposed to solve a LAP:(i)Modified Hungarian algorithm [16](ii)Simple greedy heuristic based algorithm [17](iii)Evolutionary heuristic algorithm [18].
The maximum number of inputs or states does not exceed 100 in MCNC FSM benchmarks or FSMs used in realworld applications. So, the number of vertices used in the resultant weighted bipartite graph is always low which results in small LAP. But, the number of LAPs formed in this process is enormous because input matching and state matching are performed together as shown in Figure 2. Hence, the primary requirement of the algorithm to solve LAP becomes the fast convergence. Therefore, a cost matrix reduction based technique, that is, Hungarian algorithm [15, 16], is used for matching. A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search. The pseudocode of this technique is summarized in Algorithm 1. (Note: subscripts “base” and “recon” denote the parameters of base_ckt and recon_ckt, respectively, throughout the paper.)

At every iteration ∈, descriptions of two FSMs, that is, base_ckt and recon_ckt_b, are taken as inputs. The major contributing factors for power consumption and LUT requirement in FSM are the number of inputs and the internal states [8, 19]. In any FSM, input variable and states are interdependent. Thus, input and state matching are performed together between base_ckt and recon_ckt.
If _base ≥ _recon, then combinations of input lines for base_ckt are generated to match with input lines of recon_ckt_b. () input lines act as don’t cares while the system operates in recon_ckt_b mode. Otherwise, combinations of input lines for recon_ckt_b are generated to match with input lines of base_ckt. In this case, () input lines act as don’t cares while the system operates in base_ckt mode.
Now, for each combination of input lines, state matching is performed (Algorithm 2). This situation can be seen as a LAP where the objective is to match the states of recon_ckt_b to the states of base_ckt with minimal cost [14, 17]. For this, the number of states in both the FSMs is equalized. Thus, if _base ≥ _recon, then (, where ) dummy states are added in recon_ckt_b. Otherwise (, where ) dummy states are added in base_ckt.

All LAP solving algorithms require a cost matrix as an input to perform an optimal assignment. So, to form a cost matrix for this problem, a procedure named weight_assignment is proposed.
In this procedure, the combinations of input lines, and , for base_ckt and recon_ckt_b are taken as inputs. It provides the cost matrix to map recon_ckt_b states into base_ckt states. An array is created at each transition in both base_ckt and recon_ckt_b by combining [input_combination , ].
The basic idea that has been implemented is as follows: (i) replace the recon_ckt_b state with the base_ckt state sequentially in the recon_ckt_array; (ii) evaluate the weight by performing BitwiseXOR operation (i.e., transition matching) for that particular replacement; (iii) then, construct the cost matrix.
For each transition in recon_ckt_array (i.e., ), transition matching is performed. This situation can be seen as a LAP where the objective is to match the transition of recon_ckt_b to the transition of base_ckt with minimal cost [14, 17]. For this, the number of transitions for the particular state is equalized in both the FSMs. Therefore, if ≥ , then (, where ) dummy transitions are added in the recon_ckt_array. Otherwise (, where ) dummy transitions are added in the base_ckt_array. Thus, for each transition in base_ckt_array (i.e., ), a BitwiseXOR operation is performed between the arrays for that particular transition. The total number of 1’s in the BitwiseXOR operations is counted to create a cost matrix for transition matching. Then, optimal assignment of transitions is performed by greedy based heuristic Hungarian algorithm (GBH_hungarian_algorithm) between base_ckt_array and recon_ckt_array. Let match_count be a variable defined as
In this way, by using match_count (from (1)), the cost matrix formation to map recon_ckt_b states into base_ckt states is completed. The pseudocode of the procedure, weight_assignment, is summarized in Algorithm 5.
Let and represent the set of vertices (i.e., transitions or states) for recon_ckt and base_ckt, respectively. ) is defined as a balanced weighted bipartite graph, where . is the cost matrix. A number for each edge is called the cost (or weight) of the edge .
In GBH_hungarian_algorithm, the cost matrix is taken as input. It provides an optimal assignment between and as output. GBH in [17] is an iterative cost matrix reduction based approach to solve the LAP. At each iteration, a single vertex is eliminated from either or until the advent of some stopping conditions. Let be the last iteration (whereas is a positive integer). Therefore, either or () vertices are eliminated from at the last iteration.
Let and be the subsets of the remaining vertices in and , respectively, at iteration . At the first iteration, that is, , , and , respectively, the objective of the LAP is to assign resources to tasks in such a way that optimal total cost should be obtained for the assignment. The LAP can be mathematically formulated as follows:
Equation (2) represents the objective function for LAP. If resource is allocated to task then the decision variable and 0 otherwise as depicted in (5). Onetoone mapping should be practiced between resources and tasks. Equations (3) and (4) ensure these criteria.
At each iteration, there are two options to eliminate a vertex, that is, from either or . For each and , the following parameters are defined to select one of the above options:
In (6), and can act as “potential cost contribution” [17] of vertices and to in (2). Thus, the potential cost contribution is evaluated for the vertices, and if it exceeds the corresponding removal cost, then such vertices are eliminated.
If , then an attempt is made to remove one of the vertices from . From (7), if , that is, the objective function value is improved by eliminating , then is set to and the next iteration is executed.
Otherwise, one of the vertices from is eliminated. From (8), if , that is, the objective function value is improved by eliminating , then is set to and the next iteration is executed.
In this case, when the objective function value is not improved by eliminating either or , then algorithm halts and the obtained solution is . Furthermore, if , then the above steps are repeated in the opposite order. The pseudocode of this approach is devised in Algorithm 6.
Therefore, after obtaining the cost matrix from weight_assignment for state matching, GBH_hungarian_algorithm is applied to obtain the following parameters:
Thus, all the recon_ckt_b states are replaced by their assigned base_ckt states, and all the complete arrays of recon_ckt_b are arranged corresponding to order. Hence, from (9), the combinations of input lines are selected with & .
Now, binary state codes and are applied in base_ckt and recon_ckt_b. As it changes the weights of cost matrix, weight_assignment is again applied to construct a modified cost matrix. In this case, arrays are created by combining [selected_input_combination, , ]. Dummy states are replaced in matched states of base_ckt and recon_ckt_b by using Propositions 1 and 2. Then, dummy transitions are replaced by using Proposition 1. The dummy replacement algorithm is shown in Algorithm 3.

Proposition 1. Dummy transitions in a matched state of base_ckt or recon_ckt_b should be replaced with one of the existing transitions in that particular state with a minimum cost.
Proof. For each matched state (or assigned state after matching) ∈ recon_ckt_b, if () then () dummy transitions are present in recon_ckt_b state. Hence, there are () transitions, present in the corresponding state of base_ckt which are unassigned. These unassigned transitions in base_ckt will lead the system to failure while operating in recon_ckt_b mode. As a solution, these unassigned transitions of base_ckt are assigned to the existing transitions of recon_ckt_b with the least cost by looking at the particular column of the modified cost matrix.
Similarly, for each matched state (or assigned state after matching) ∈ recon_ckt_b, if () then () dummy transitions are present in base_ckt state. Hence, there are () transitions, present in the corresponding state of recon_ckt_b which are unassigned. These unassigned transitions in recon_ckt_b will lead the system to failure while operating in base_ckt mode. As a solution, these unassigned transitions of recon_ckt_b are assigned to the existing transitions of base_ckt with the least cost by looking at the particular row of the modified cost matrix.
Let represent the modified cost matrix for a matched state, where rows () and columns () denote the base_ckt and recon_ckt_b transitions, respectively. Thus, the unassigned transitions in base_ckt state can be assigned by (10) as follows:
Similarly, the unassigned transitions in recon_ckt_b state can be assigned by (11) as follows:
Proposition 2. If _base < _recon, then dummy states are replaced by splitting the matched state in base_ckt.
Proof. In FSM, splitting a state with high transitions results in low power consumption [8, 19]. It also improves the operating speed [2, 20]. If _base > _recon, then there are () states, present in base_ckt which are unassigned. These unassigned states in base_ckt will lead to failure in the system while operating in recon_ckt_b mode. As base_ckt is the largest FSM in the collection and its transitions per state are greater than recon_ckt_b, splitting recon_ckt_b states are insignificant for the system performance. So, these unassigned states of base_ckt are assigned using Proposition 1.
If _base < _recon, then dummy states are replaced by splitting the matched state in base_ckt. Let = , where is a positive integer. Only the states for which can be split [19]. Each state can be split into nonoverlapping subsets of () transitions. Algorithm 7 is proposed to split a base_ckt state.
At this stage, the states and the input lines of both the FSMs are completely matched and fixed. Hence, the output matching is performed by performing a BitwiseXOR operation and selecting the combination with the least count of 1’s. If _base ≥ _recon, then combinations of output lines for base_ckt are generated to match with output lines of recon_ckt_b. Otherwise, combinations of output lines for recon_ckt_b are generated to match with output lines of base_ckt. Then, for each combination of output lines, BitwiseXOR operation is performed between corresponding output lines of base_ckt and recon_ckt_b. Let , where represents the total number of 1’s in the BitwiseXOR operation for a particular combination of output lines. Therefore, the combinations of output lines with are selected.
At the end of every iteration, the description of base_ckt is updated to operate on the next iteration. At the end of th iteration, for each , replacement of dummy transitions and states is performed and updated descriptions of , , are obtained. In this way, descriptions of all FSMs are optimally matched, having the same dimension. Therefore, a mutual (i.e., ) BitwiseXOR operation between the updated descriptions of FSMs is conducted which provides the optimized multiplexer bank for mode based reconfiguration.
3. Experimental Evaluation
Experiments have been conducted to illustrate the advantages of the proposed architecture using the FSM benchmark circuits from MCNC/LGSynth [21] as shown in Table 1.

The proposed iterative greedy heuristic based Hungarian algorithm has been implemented in MATLAB (2016b) environment. MATLAB HDL Coder tool is used to generate the Verilog HDL code for multiplexer bank for mode based reconfiguration. The Reconfigurable FSMIMS architecture is described in Verilog HDL and implemented on a Xilinx xc6vlx75t Speed Grade3 device (Virtex6) by using Xilinx ISE 14.6 [15]. All computations are performed using a computer with an Intel(R) Core(TM) i5, 8 GB RAM, and 2.67 GHz CPU.
Let be the input lines, be the output lines, and be the states of an FSM. In the proposed algorithm, at the first stage, input matching is performed along with the state matching; after that, dummy states and transitions are replaced. Then, output matching is performed (Algorithm 4).




As the number of inputs or outputs exceeds 8, it requires the generation of more than combinations for matching, which exhausts the simulation resources. Hence, the excess input lines are discarded from input matching, which contains the maximum number of don’t cares out of the total number of transitions. Similarly, the excess output lines are discarded from output matching, which contains the minimum number of 1’s out of the total number of transitions. Therefore, the complexities of input selector bank and group encoder are reduced because the information content of these lines is minimum.
The FSM “s1494” has been considered as base_ckt, because it consists of 48 states, 8 inputs, 19 outputs, and 250 transitions which are of higher values as compared with any of the FSMs in the collection. Hence, “s1494” is considered as an FSM included in the design at the 0th iteration. In this case, state splitting is never used for dummy state replacement, because base_ckt contains the highest number of states. All dummy states and transitions are replaced by using Proposition 1. For output matching, , and are discarded because they contain , and instances of 1’s, respectively, out of a total of 250 transitions.
In the 1st iteration, an FSM, “sand,” is included in the design. For input matching, , , and are discarded because they contain 178, 150, and 182 don’t cares, respectively, out of a total of 184 transitions. All states are matched with the states of base_ckt in respective order. For output matching, is discarded because it contains 3 instances of 1’s, out of a total of 253 transitions.
In the 2nd iteration, an FSM, “styr,” is included in the design. For input matching, is discarded because it contains 160 don’t cares, out of a total of 166 transitions. States , and are matched with , and , respectively, of base_ckt. The rest of the states are matched with the states of base_ckt in respective order. For output matching, and are discarded because they contain 5 and 6 instances of 1’s, respectively, out of a total of 254 transitions.
In the 3rd iteration, an FSM, “planet,” is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, , and are discarded because they contain , and 23 instances of 1’s, respectively, out of a total of 255 transitions.
In the 4th iteration, an FSM, “s832,” is included in the design. For input matching, , , and are discarded because they contain , and 241 don’t cares, respectively, out of a total of 245 transitions. All states are matched with the states of base_ckt in respective order. For output matching, , and are discarded because they contain , and 6 instances of 1’s, respectively, out of a total of 259 transitions.
In the 5th iteration, an FSM, “cse,” is included in the design. All states of “cse” are matched with the states of base_ckt in respective order. In the 6th iteration, an FSM, “s386,” is included in the design. All states of “s386” are matched with the states of base_ckt in respective order. In the 7th iteration, an FSM, “ex6,” is included in the design. All states of “ex6” are matched with the states of base_ckt in respective order. In the 8th iteration, an FSM, “mc,” is included in the design. All states of “mc” are matched with the states of base_ckt in respective order.
In the 9th iteration, an FSM, “planet1,” is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, , and are discarded because they contain , and 23 instances of 1’s, respectively, out of a total of 279 transitions.
In the 10th iteration, an FSM, “s1488,” is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, , and are discarded because they contain , and 64 instances of 1’s, respectively, out of a total of 281 transitions.
In the 11th iteration, an FSM, “s208,” is included in the design. For input matching, , and are discarded because they contain 153, 153, and 153 don’t cares, respectively, out of a total of 153 transitions. All states are matched with the states of base_ckt in respective order. The input matching and output matching among FSMs are shown in Tables 2 and 3, respectively, along with the minimum assignment_cost, total_cost, and XOR_count (as defined in Algorithm 1).

