Robotics paper index

Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems

2026-05-13 · arXiv: 2605.13269

One-line summary

The paper presents SubMAPG, a novel framework for efficient task allocation in multi-agent systems using submodular reinforcement learning.

Engineering notes

SubMAPG can be applied in various multi-agent scenarios, particularly in dynamic settings like drone coverage and tracking tasks. Its decentralized nature allows each agent to adapt to real-time changes in the environment, improving responsiveness.

Chinese explanation / 中文解读

本文研究了带有子模块团队效用的多智能体强化学习在在线分布式任务分配中的应用。提出了一种新的连续松弛方法，即分区多线性扩展（PME），以及一个集中训练、分散执行的政策梯度框架SubMAPG，显著改善了多智能体系统在动态环境中的任务分配效率。

Original abstract

This paper studies multi-agent reinforcement learning with submodular team utilities for online distributed task allocation. In this setting, each agent selects one action from a local categorical policy, so feasible joint actions form a partition matroid over agent-action pairs. Classical multilinear extensions use independent Bernoulli sampling and therefore do not match the categorical policies executed by decentralized agents. To address this mismatch, we introduce the Partition Multilinear Extension (PME), a continuous relaxation whose value equals the expected team utility under factorized categorical policies. We prove that submodular difference rewards provide unbiased PME marginal-gradient information and yield a stagewise score-function policy-gradient estimator. Based on this connection, we propose SubMAPG, a centralized-training decentralized-execution policy-gradient framework with masked categorical policies and submodular difference-reward training signals. For the associated PME marginal-space projected stochastic-gradient dynamics, we prove a stagewise 1/2-approximation guarantee and sublinear dynamic regret in slowly varying environments, measured by the path length of the optimal PME marginals. To handle open systems with time-varying agents and targets, we instantiate SubMAPG with graph neural network policies. Experiments on multi-robot coverage and multi-target tracking show that SubMAPG outperforms local greedy and shared-reward baselines and is competitive with centralized myopic greedy strategies.

5.0Engineering value

7.0Research novelty

4.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Robot Papers can prepare a custom robotics literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.