Review Article

Future Trends for Human-AI Collaboration: A Comprehensive Taxonomy of AI/AGI Using Multiple Intelligences and Learning Styles

Figure 12

Categorization of deep reinforcement learning algorithms. Each category consists of a number of specific algorithms (see, e.g., [50]): Stochastic Policy Gradient (SPG): REINFORCE, Soft Actor-Critic (SAC), and Asynchronous Advantage Actor-Critic (A3C); Simple Policy Gradients algorithms (SmpPG): REINFORCE, SAC, A3C, Deep Deterministic Policy Gradient (DDPG), Distributed Distributional Deep Deterministic Policy Gradients (D4PG), and Twin Delayed Deep Deterministic (TD3); Deep Q-Networks: DQN, Double Deep Q-Network (DDQN), DDQN with Duel Architecture, and DDQN with Proportional Prioritization; Actor-Critic (AC): SAC, A3C, Deep Deterministic Policy Gradient (DDPG), D4PG, TD3, Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO); Monte Carlo (MC): REINFORCE, PPO, and TRPO; Natural Policy Gradient (NPG): TRPO, PPO, Actor-Critic using Kronecker-Factored Trust Region (ACKTR), and Actor-Critic with Experience Replay (ACER); Deterministic Policy Gradient (DPG): DDPG, D4PG, and TD3; Q-Prop (a hybrid of MC & AC). Finally, the most important and perspective for our AI multisystems with multiple intelligences are two classes of sophisticated RL algorithms: Partially Observable Markov Decision Process (POMDP): Deep Belief Q-network (DBQN), Deep Recurrent Q-network (DRQN), Recurrent Deterministic Policy Gradients (RDPG), and Multiagent (MA) learning: Multiagent Importance Sampling (MAIS), Coordinated Multiagent DQN, Multiagent Fingerprints (MAF), Counterfactual Multiagent Policy Gradient (COMAPG), and Multiagent DDPG (MADDPG) (see [16, 21, 50]) and references therein for more details).