Research Article

Reinforcement Learning-Based Genetic Algorithm in Optimizing Multidimensional Data Discretization Scheme

Algorithm 2

RLGA algorithm process.
Input: Multidimensional data discretization scheme
Output: Optimal discretization scheme
Initialize: global variable = 0, local variable = 0, crossover Q-Table = null, mutation Q-Table = null, t = 0;
begin
 Get the initial breakpoints of the multidimensional data by sorting the values of each feature and removing duplicate values;
 Binary encode the initial breakpoints of multidimensional data according to the method in Part B of Section2;
 Randomly generate initial population P(t);
 Calculate the fitness of each individual in P(t) using equation (8);
 Update global variable with the optimal individual fitness value in P(t);
 Generate state set based on the number of features of multidimensional data according to the definition of state in Part C of Section3;
 Choose a state from the state set as the initial state S(t);
while t is less than the user’s termination iterations do
  Choose an action from the action set {G, H, I} by e-greedy strategy according to the definition of action in Part C of Section3;
  Execute the selected action on the current state S(t) to jump to the next state S(t + 1);
  Perform crossover operation with global variable on the features contained in state S(t + 1);
  Calculate the fitness of the multidimensional data discretization scheme after crossover operation using equation (5);
  Measure the corresponding reward using equation (11) according to the definition of reward in Part C of Section3;
  Update crossover Q-Table using equation (6);
  if the fitness of the multidimensional data discretization scheme > local variable do
   Update local variable with the fitness of the multidimensional data discretization scheme;
  end
  Perform crossover operation in P(t);
  Calculate the fitness of each individual in P(t) using equation (8);
  Update global variable with the optimal individual fitness value in P(t);
  S(t) = S(t + 1);
  Choose an action from the action set {G, H, I} by e-greedy strategy according to the definition of action in Part C of Section3;
  Execute the selected action on the current state S(t) to jump to the next state S(t + 1);
  Perform mutation operation on the features contained in state S(t + 1)
  Calculate the fitness of the multidimensional data discretization scheme after mutation operation using equation (5);
  Measure the corresponding reward using equation (11) according to the definition of reward in Part C of Section3;
  Update mutation Q-Table using equation (6);
  if the fitness of the multidimensional data discretization scheme > local variable do
   Update local variable with the fitness of the multidimensional data discretization scheme;
  end
  Perform mutation operation in P(t);
  Calculate the fitness of each individual in P(t) using equation (8);
  Update global variable with the optimal individual fitness value in P(t);
  t = t + 1;
end
 Return Max(global variable, local variable);
end