Input: Factorization threshold , latent factor , max iterations |
1: Calculate the percentage of visited states through Eq. (7) and the increment of the percentage of visited states by Eq. (8). |
2: If then |
3: Extract the visited entries from Q table as experiences triples. |
4: Concatenate the one hot encodings of state and action , respectively, in experience |
Tuples to form feature vector and take as the corresponding target. |
5: Estimate the model parameters at most iterations by Eq. (12). |
6: Complete the Q table via Eq. (9). |
7: Replace the original Q table with the factorized Q table. |
8: End if |