目的制限に基づく通信なしマルチエージェント協調行動学習とその効果の証明

概要

This paper extended PMRL as the non-communicative and theoretical method for two agents, and proposed PLA as the method to be able to force agents to learn cooperative behavior for any number of agents. In addition, this paper adds the theoretic explanation for PLA that all agents achieve all purposes without spending the largest times. Concretely PLA forces each agent to avoid the more difficult purposes requiring many time to be reached by limiting the purpose which it can achieve, and it forces the agents to learn cooperative policy as achieving the appropriate purpose among the limited purposes. The experimental results in this paper derive that (1) PLA enables the agents to learn cooperative policy in the two grid world problems for three and five agents, and (2) PLA can force all agents to achieve all purposes in the problems with the minimum time.

論文誌情報

題目: 目的制限に基づく通信なしマルチエージェント協調行動学習とその効果の証明
著者: 上野史,髙玉圭樹
誌名: 電気学会論文誌C
詳細: 2020年140巻1号pp.75-84

Bibtex or Download

上野 史, 髙玉 圭樹. 目的制限に基づく通信なしマルチエージェント協調行動学習とその効果の証明. 電気学会論文誌C, 140(1): 75-84, 2020.
[BibTeX] [Download PDF]
@article{史2020目的制限に基づく通信なしマルチエージェント協調行動学習とその効果の証明,
  title={目的制限に基づく通信なしマルチエージェント協調行動学習とその効果の証明},
  author={史, 上野 and 圭樹, 髙玉},
  journal={電気学会論文誌C},
  year={2020},
  number={1},
  pages={75--84},
  publisher={電気学会}
}

※「髙」が表示できない場合は\UTF{9AD9}に置き換えてください