BRACE: When Replanning Becomes the Bottleneck

Abstract

BRACE = Budgeted Replanning and Coordination for Embodied-Agents.

High-frequency replanning is a dominant and under-measured bottleneck for embodied agents: human instructions are often high-level and underspecified; interactive clarification is not always available and, when allowed, is constrained, while dynamic environments still force repeated replanning. As histories and world-state summaries grow over time (and, in multi-agent settings, across cooperating agents), replanning calls become slower and more variable, causing missed deadlines under strict token and latency budgets. We introduce BRACE, a budgeted replanning framework that treats replanning as a systems control problem, selecting replanning modes and allocating per-call token/time budgets to reduce instability such as plan churn and coordination deadlocks. BRACE is modular and composes with efficiency mechanisms on the replanning call path, including E-RECAP token pruning and retrieval-based replanning memory. On Habitat-Lab navigation under multi-agent context growth, E-RECAP reduces replanning tokens by 71–76% and replanning latency by 2.1–2.6× relative to no pruning, with minimal loss in success and SPL. We argue that budgeting and tail-aware reporting are essential for studying replanning at scale, and propose an evaluation paradigm that reports tail latency and Service-Level Objective (SLO) violations alongside task success.

Controller: budget assignment + commitment/cooldown for stable high-frequency replanning.
Auditable logging: phase breakdown (context construction, compression, planner call, VLA policy call).
Efficiency modules: E-RECAP-style context pruning, and RAG/memory ablations (optional).
Evidence: tail latency / SLO improvements and qualitative demos across three domains.

Overview

BRACE sits on the replanning call path. On each step, the agent decides whether to replan (periodic + event triggers), then BRACE allocates a compute/latency budget and optionally applies efficiency modules (pruning, memory/RAG) before calling the planner or VLM. The controller enforces stability via commitment windows and cooldowns, and logs phase costs in a unified run schema for auditing.

Budget axis: tokens/budget assigned to replanning calls (cost control).
Frequency axis: replanning interval/triggers (throughput control).
Stability axis: commitment/cooldown/deadlock knobs (behavior control).

Figure 1: BRACE workflow (controller + composable modules on the replanning call path).

E-RECAP: Embodied REplanning with Cost-Aware Pruning

Figure 2: E-RECAP module overview (pruning on the replanning call path).

E-RECAP is a composable acceleration module that plugs into BRACE’s replanning call path. When replanning is triggered, E-RECAP reduces the effective planner input by progressively pruning context tokens while preserving critical task, state, safety, and coordination information.

This targets replanning under context growth: as histories and multi-agent signals accumulate, attention cost grows quickly with input length. By enforcing an explicit per-call budget (via a keep ratio r), E-RECAP can shrink attention cost without modifying tasks, environments, controllers, or triggers.

Drop-in: operates on the replanning context at trigger time (no changes to the underlying planner/VLM).
Budget-aligned: helps meet per-call token budgets, reducing tail latency and SLO violations.
Auditable: pruning overhead is logged as its own phase.

Motivation: tail latency and deadline misses

In embodied replanning, success can saturate even when replanning repeatedly misses real-time deadlines. We therefore evaluate with tail latency (e.g., P95/P99) and SLO violation rate (fraction of replanning calls exceeding the per-domain latency SLO), alongside task success.

Example (Meta Habitat, SLO=2500ms): No BRACE shows frequent deadline misses (85.5% SLO violations), while applying pruning on the replanning call path (as used in BRACE) reduces the replanning context from ~235 tokens to ~20 tokens and drops SLO violations to 4.7%.

Figure 3: Meta Habitat motivating snapshot: keeping replanning context tractable helps avoid deadline-miss cascades.

Demos

Below are side-by-side comparisons demonstrating BRACE's impact across three domains. Each demo shows baseline behavior (left) versus BRACE or BRACE + E-RECAP (right), highlighting improvements in replanning latency, SLO compliance, and multi-agent coordination. The clips are lightweight MP4s suitable for GitHub Pages; full-resolution artifacts are available in the public Google Drive folder (link above).

Meta Habitat: PointGoal navigation under strict SLO constraints, showing how E-RECAP token pruning keeps replanning within deadline.
RoboFactory: Multi-agent manipulation (PassShoe task) demonstrating coordination improvements and reduced wait times.
Microsoft AirSim: High-frequency replanning with 8 drones navigating a shared intersection, showcasing stable coordination and collision avoidance.

Video 1: Habitat (navigation) — Tail latency / SLO bottleneck

Video 2: RoboFactory (manipulation) — Coordination (deadlock/wait)

Video 3: AirSim (vehicles/drones) — High-frequency replanning (cinematic)

Key results: tail distributions and SLO violations

Average latency can hide instability. The plots below summarize tail behavior (CDF with an SLO threshold) and SLO violation rates across platforms under a common replanning accounting definition.

Success can saturate while deadlines are missed: across platforms, baselines can achieve high success yet frequently violate replanning SLOs.
Token reduction is necessary but not sufficient: budgeting + phase accounting make tail behavior explicit, while composable modules (e.g., pruning) reduce tail latency and SLO violations.
Tail-aware reporting matters: heavy tails (e.g., AirSim) motivate reporting percentiles and SLO violations, not averages alone.

Tail latency CDF (Meta Habitat) with SLO threshold

Figure 7: Tail latency (CDF) with SLO threshold (Meta Habitat).

Figure 8: SLO violation snapshot across platforms.

Reproducibility

Start from the repository root:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# smoke entrypoints (require local domain dependencies)
scripts/smoke_all.sh

See docs/README.md for environment variables, run scripts, and postprocessing into markdown tables.

When Replanning Becomes the Bottleneck: Budgeted Embodied Agent Replanning