This paper introduces a reinforcement learning technique with an internal reward for a multi-agent cooperation task. The proposed methods is an extension of Q-learning which changes the ordinary (external) reward to the internal reward for agent-cooperation. Specifically, we propose here two Q-learning methods, both of which employ the internal reward for the less or no communication. To guarantee the effectiveness of the proposed methods, we theoretically derived the mechanisms that solve the following questions: (1) how the internal rewards should be set to guarantee the cooperation among the agents under the condition of less and no communication; and (2) how the values of the cooperative behaviors types (i.e., the varieties of the cooperative behaviors of the agents) should be updated under the condition of no communication. The intensive simulations on the maze problem for the agent-cooperation task have been revealed that our two proposed methods successfully enable the agents to acquire their cooperative behaviors even in less or no communication, while the conventional method (Q-learning) always fails to acquire such behaviors.
題目: Multi-Agent Cooperation Based on Reinforcement Learning with Internal Reward in Maze Problem
著者: Fumito Uwano, Naoki Tatebe, Yusuke Tajima, Masaya Nakata, Tim Kovacs and Keiki Takadama
誌名: SICE Journal of Control, Measurement, and System Integration (JCMSI)
詳細: Volume 11, Number 4, 2018, pp.321-330
@article{fumito uwano 2018multi,
title={Multi-Agent Cooperation Based on Reinforcement Learning with Internal Reward in Maze Problem},
author={Fumito Uwano and Naoki Tatebe and Yusuke Tajima and Masaya Nakata and Tim Kovacs and Keiki Takadama},
journal={SICE Journal of Control, Measurement, and System Integration},
year={2018},
volume={11},
number={4},
pages={321--330},
publisher={SICE}
}