Mathematical Problems in Engineering

Research Article

Reinforcement Learning-Based Genetic Algorithm in Optimizing Multidimensional Data Discretization Scheme

Algorithm 2

RLGA algorithm process.

	Input: Multidimensional data discretization scheme
	Output: Optimal discretization scheme
	Initialize: global variable = 0, local variable = 0, crossover Q-Table = null, mutation Q-Table = null, t = 0;
	begin
	Get the initial breakpoints of the multidimensional data by sorting the values of each feature and removing duplicate values;
	Binary encode the initial breakpoints of multidimensional data according to the method in Part B of Section2;
	Randomly generate initial population P(t);
	Calculate the fitness of each individual in P(t) using equation (8);
	Update global variable with the optimal individual fitness value in P(t);
	Generate state set based on the number of features of multidimensional data according to the definition of state in Part C of Section3;
	Choose a state from the state set as the initial state S(t);
	while t is less than the user’s termination iterations do
	Choose an action from the action set {G, H, I} by e-greedy strategy according to the definition of action in Part C of Section3;
	Execute the selected action on the current state S(t) to jump to the next state S(t + 1);
	Perform crossover operation with global variable on the features contained in state S(t + 1);
	Calculate the fitness of the multidimensional data discretization scheme after crossover operation using equation (5);
	Measure the corresponding reward using equation (11) according to the definition of reward in Part C of Section3;
	Update crossover Q-Table using equation (6);
	if the fitness of the multidimensional data discretization scheme > local variable do
	Update local variable with the fitness of the multidimensional data discretization scheme;
	end
	Perform crossover operation in P(t);
	Calculate the fitness of each individual in P(t) using equation (8);
	Update global variable with the optimal individual fitness value in P(t);
	S(t) = S(t + 1);
	Choose an action from the action set {G, H, I} by e-greedy strategy according to the definition of action in Part C of Section3;
	Execute the selected action on the current state S(t) to jump to the next state S(t + 1);
	Perform mutation operation on the features contained in state S(t + 1)
	Calculate the fitness of the multidimensional data discretization scheme after mutation operation using equation (5);
	Measure the corresponding reward using equation (11) according to the definition of reward in Part C of Section3;
	Update mutation Q-Table using equation (6);
	if the fitness of the multidimensional data discretization scheme > local variable do
	Update local variable with the fitness of the multidimensional data discretization scheme;
	end
	Perform mutation operation in P(t);
	Calculate the fitness of each individual in P(t) using equation (8);
	Update global variable with the optimal individual fitness value in P(t);
	t = t + 1;
	end
	Return Max(global variable, local variable);
	end