Research Article
Early Rumor Detection Based on Deep Recurrent Q-Learning
| Input: Network , Environment set , Experience pool | | Output: | (1) | Initialize current network , and target network , | (2) | for each epoch do | (3) | Select an environment from | (4) | Initialize environment , and get state | (5) | while true do | (6) | According to , use -greedy strategy to select action from | (7) | Perform action in the environment to get the new state and reward | (8) | if is full do | (9) | Delete the oldest experience record | (10) | end if | (11) | Insert into | (12) | | (13) | if is the last state do | (14) | break | (15) | end if | (16) | end while | (17) | if is full do | (18) | Select a batch of records from randomly | (19) | for each record do | (20) | Use target network to get | (21) | Use loss function to update current network | (22) | Update current network with target network every epochs | (23) | end for | (24) | end if | (25) | end for |
|