Research Article
Vision-Based Deep Q-Learning Network Models to Predict Particulate Matter Concentration Levels Using Temporal Digital Image Data
| Initialize model configuration | (i) Initialize action-value function Q with random weights | (ii) Construct sequence arrays (i.e., at time ) of nine channels and randomly sample | bootstrap batch out of the integrated data pool (i.e., as default) | (iii) Initialize sequence and preprocessed sequenced (i.e., standardization and filtering | outliers exceeding 90th quantile) via , namely . |
| Create difference values of two consecutive arrays | | Repeat the following for | (i) To derive the optimized action, select a random action , where | (i.e., safe or harmful) with probability | (ii) Otherwise select | (iii) Execute action in the predictive rule and observe reward | and new incoming sequence | (iv) Set , , and process and calculate rewards determining | actions and impose the weight according to testing outcome (i.e., true or false) and update | every 10 times. | (v) For , set as follows | | where | | and , a true class label monitored via a device and set in this paper. | (vi) Perform a gradient descent step on . |
|
|