A Comprehensive Survey of Multiagent Reinforcement Learning | Notion

Benefits and Challenges in MARL

Benefits

agent끼리 경험을 공유하면 비슷한 task를 수행하는 경우 서로에게 도움이 된다.

Challenges

차원의 저주
agent끼리의 상관관계 때문에 서로 학습에 방해가 될 수 있다.
agent들이 계속 action을 취하고 이에 따라 environment가 변하기 때문에 nonstationary 상태가 된다.
- environment 뿐만 아니라 other agents도 고려해야 함

MARL GOAL

Stability
- dynamic environments 상황에서도 stationary policy가 convergence 상태에 이르는 것
Adaptation
- 다른 agents의 영향에도 자신의 Performance를 유지하거나 향상시키는 것

Stability Property

equilibrium learning
1. converge to a coordinated equilibrium
2. Nash equilibria
3. stagewise convergence와 Nash equilibria 사이의 상관관계 정확하지 않다는 비판 [14]
- Nash equilibrium
convergence
opponent independent

The need for Stability