Research Article

Trading and Pricing Sensor Data in Competing Edge Servers with Double Auction Markets

Algorithm 2.

MADDPG algorithm.
Input: continuous action space and of Market
Output: Equilibrium pricing strategies and of market and
1 Initialization: Initialize the actor network and critical network of and , respectively, and initialize the respective parameters and to initialize the target networks and corresponding to the above two networks and parameters and , initialize replay memory
2 Random initialization distribution N for action exploration;
3 Initial respective market states and , and set the iteration cycle to
4 whileThe loss function of traders is not convergencedo
5  Action selection:
6  The market selects actions and according to , respectively
7  Release the pricing action , then the trader adjusts his equilibrium trading strategy (I-PDQN algorithm) under the pricing, and then the market calculates the reward and the new state
8  Store tuples in the replay memory
9  Strategy training:
10  for (Update the strategy network for the two markets, respectively)do
11   Randomly sample r tuples from replay memory and calculate
12   The critical network Q is updated by minimizing the loss function of equation (21)
13   The actor network is updated by maximizing the gradient of the sample strategy through equation (20)
14   Update the target network parameters and through equation
15  end
16 end