When Replanning Becomes the Bottleneck: Budgeted Embodied Agent Replanning
Website: nebulis-lab.com/BRACE
Abstract
BRACE = Budgeted Replanning and Coordination for Embodied-Agents.
High-frequency replanning is a dominant and under-measured bottleneck for embodied agents: human instructions are often high-level and underspecified; interactive clarification is not always available and, when allowed, is constrained, while dynamic environments still force repeated replanning. As histories and world-state summaries grow over time (and, in multi-agent settings, across cooperating agents), replanning calls become slower and more variable, causing missed deadlines under strict token and latency budgets. We introduce BRACE, a budgeted replanning framework that treats replanning as a systems control problem, selecting replanning modes and allocating per-call token/time budgets to reduce instability such as plan churn and coordination deadlocks. BRACE is modular and composes with efficiency mechanisms on the replanning call path, including E-RECAP token pruning and retrieval-based replanning memory. On Habitat-Lab navigation under multi-agent context growth, E-RECAP reduces replanning tokens by 71–76% and replanning latency by 2.1–2.6× relative to no pruning, with minimal loss in success and SPL. We argue that budgeting and tail-aware reporting are essential for studying replanning at scale, and propose an evaluation paradigm that reports tail latency and Service-Level Objective (SLO) violations alongside task success.
- Controller: budget assignment + commitment/cooldown for stable high-frequency replanning.
- Auditable logging: phase breakdown (context construction, compression, planner call, VLA policy call).
- Efficiency modules: E-RECAP-style context pruning, and RAG/memory ablations (optional).
- Evidence: tail latency / SLO improvements and qualitative demos across three domains.
Overview
BRACE sits on the replanning call path. On each step, the agent decides whether to replan (periodic + event triggers), then BRACE allocates a compute/latency budget and optionally applies efficiency modules (pruning, memory/RAG) before calling the planner or VLM. The controller enforces stability via commitment windows and cooldowns, and logs phase costs in a unified run schema for auditing.
- Budget axis: tokens/budget assigned to replanning calls (cost control).
- Frequency axis: replanning interval/triggers (throughput control).
- Stability axis: commitment/cooldown/deadlock knobs (behavior control).
Figure 1: BRACE workflow (controller + composable modules on the replanning call path).
E-RECAP: Embodied REplanning with Cost-Aware Pruning
Figure 2: E-RECAP module overview (pruning on the replanning call path).
E-RECAP is a composable acceleration module that plugs into BRACE’s replanning call path. When replanning is triggered, E-RECAP reduces the effective planner input by progressively pruning context tokens while preserving critical task, state, safety, and coordination information.
This targets replanning under context growth: as histories and multi-agent signals accumulate, attention cost grows quickly with input length.
By enforcing an explicit per-call budget (via a keep ratio r), E-RECAP can shrink attention cost without modifying tasks, environments, controllers, or triggers.
- Drop-in: operates on the replanning context at trigger time (no changes to the underlying planner/VLM).
- Budget-aligned: helps meet per-call token budgets, reducing tail latency and SLO violations.
- Auditable: pruning overhead is logged as its own phase.
Motivation: tail latency and deadline misses
In embodied replanning, success can saturate even when replanning repeatedly misses real-time deadlines. We therefore evaluate with tail latency (e.g., P95/P99) and SLO violation rate (fraction of replanning calls exceeding the per-domain latency SLO), alongside task success.
Example (Meta Habitat, SLO=2500ms): No BRACE shows frequent deadline misses (85.5% SLO violations), while applying pruning on the replanning call path (as used in BRACE) reduces the replanning context from ~235 tokens to ~20 tokens and drops SLO violations to 4.7%.
Figure 3: Meta Habitat motivating snapshot: keeping replanning context tractable helps avoid deadline-miss cascades.
Demos
Below are side-by-side comparisons demonstrating BRACE's impact across three domains. Each demo shows baseline behavior (left) versus BRACE or BRACE + E-RECAP (right), highlighting improvements in replanning latency, SLO compliance, and multi-agent coordination. The clips are lightweight MP4s suitable for GitHub Pages; full-resolution artifacts are available in the public Google Drive folder (link above).
- Meta Habitat: PointGoal navigation under strict SLO constraints, showing how E-RECAP token pruning keeps replanning within deadline.
- RoboFactory: Multi-agent manipulation (PassShoe task) demonstrating coordination improvements and reduced wait times.
- Microsoft AirSim: High-frequency replanning with 8 drones navigating a shared intersection, showcasing stable coordination and collision avoidance.
Video 1: Habitat (navigation) — Tail latency / SLO bottleneck
Video 2: RoboFactory (manipulation) — Coordination (deadlock/wait)
Video 3: AirSim (vehicles/drones) — High-frequency replanning (cinematic)
Key results: tail distributions and SLO violations
Average latency can hide instability. The plots below summarize tail behavior (CDF with an SLO threshold) and SLO violation rates across platforms under a common replanning accounting definition.
- Success can saturate while deadlines are missed: across platforms, baselines can achieve high success yet frequently violate replanning SLOs.
- Token reduction is necessary but not sufficient: budgeting + phase accounting make tail behavior explicit, while composable modules (e.g., pruning) reduce tail latency and SLO violations.
- Tail-aware reporting matters: heavy tails (e.g., AirSim) motivate reporting percentiles and SLO violations, not averages alone.
Figure 7: Tail latency (CDF) with SLO threshold (Meta Habitat).
Figure 8: SLO violation snapshot across platforms.
Reproducibility
Start from the repository root:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# smoke entrypoints (require local domain dependencies)
scripts/smoke_all.sh
See docs/README.md for environment variables, run scripts, and postprocessing into markdown tables.