Utilizing Observed Information for No-Communication Multi-Agent Reinforcement Learning toward Cooperation in Dynamic Environment

概要

This paper proposes a multi-agent reinforcement learning method without communication toward dynamic environments, called profit minimizing reinforcement learning with oblivion of memory (PMRL-OM). PMRL-OM is extended from PMRL and defines a memory range that only utilizes the valuable information from the environment. Since agents do not require information observed before an environmental change, the agents utilize the information acquired after a certain iteration, which is performed by the memory range. In addition, PMRL-OM improves the update function for a goal value as a priority of purpose and updates the goal value based on newer information. To evaluate the effectiveness of PMRL-OM, this study compares PMRL-OM with PMRL in five dynamic maze environments, including state changes for two types of cooperation, position changes for two types of cooperation, and a combined case from these four cases. The experimental results revealed that: (a) PMRL-OM was an effective method for cooperation in all five cases of dynamic environments examined in this study; (b) PMRL-OM was more effective than PMRL was in these dynamic environments; and (c) in a memory range of 100 to 500, PMRL-OM performs well.

論文誌情報

題目: Utilizing Observed Information for No-Communication Multi-Agent Reinforcement Learning toward Cooperation in Dynamic Environment
著者: Fumito Uwano and Keiki Takadama
誌名: SICE Journal of Control, Measurement, and System Integration (JCMSI)
詳細: Volume 12, Number 5, 2019, pp.199-208

Bibtex or Download

Fumito Uwano, Keiki Takadama. Utilizing Observed Information for No-Communication Multi-Agent Reinforcement Learning toward Cooperation in Dynamic Environment. SICE Journal of Control, Measurement, and System Integration, 12(5): 199-208, 2019.
[BibTeX] [Download PDF]
@article{fumito uwano 2019utilizing,
  title={Utilizing Observed Information for No-Communication Multi-Agent Reinforcement Learning toward Cooperation in Dynamic Environment},
  author={Fumito Uwano and Keiki Takadama},
  journal={SICE Journal of Control, Measurement, and System Integration},
  year={2019},
  number={5},
  pages={199--208},
  publisher={SICE}
}