Theme
Shuaijun Liu, Feiyang You, Xingwei Chen, Ningxin Su
IoT, Information Hub, The Hong Kong University of Science and Technology (Guangzhou) | ningxinsu@hkust-gz.edu.cn

Lab Website: nebulis-lab.com

Abstract

Tokens 62-92% fewer replanning-call tokens across main platforms
SLO 4.7-50.0% violation rates after BRACE + E-RECAP
Habitat 2.1-2.6x replanning-latency speedup under context growth
Harder RF 80.0% success when open-loop, frozen-plan, No BRACE fail

Embodied agents replan frequently to recover from execution drift, partial observability, and coordination hazards, but each LLM-based replanning call can consume an accumulated textual context that grows over time and across agents. Once this context becomes large, replanning latency develops heavy tails and can miss real-time deadlines even when task success remains high. BRACE (Budgeted Replanning for Agentic Control in Embodied Systems) reframes replanning in embodied intelligence from simply improving planning ability into a systematic control problem: when to replan, how to replan, and at what cost.

The controller decides whether to invoke replanning, selects the replanning mode, and allocates an explicit token budget and latency service-level objective (SLO) while accounting for optional modules on the replanning call path. We instantiate one such module, E-RECAP, to show how BRACE can compose compression, retrieval, caching, or other efficiency mechanisms to change replanning real-time behavior and closed-loop performance. Across Meta Habitat, RoboFactory, and AirSim, BRACE with E-RECAP reduces replanning-call token counts by 62-92% and SLO violation rates from 85.5-100.0% to 4.7-50.0%; in a harder RoboFactory setting, it reaches 80.0% success with 4.6% SLO violations.

Overview

BRACE overview figure from the paper
Figure 1. Overview of BRACE: a closed-loop controller that determines whether to invoke replanning and selects the per-call token budget and latency SLO, with E-RECAP as a composable token-pruning module on the replanning call path.

BRACE treats each potential replanning point as a controller decision, shifting the question from only improving the planner to controlling when, how, and at what cost replanning should occur. This makes replanning latency, token growth, and SLO misses first-class closed-loop signals rather than incidental implementation details.

  • Admission: trigger evaluation plus cooldown and commit windows prevent rapid replanning churn.
  • Budgeting: per-call token and latency budgets make planner cost explicit and comparable.
  • Composition: compression, retrieval, caching, or other modules can be inserted into the replanning call path and accounted for explicitly.

Motivation

A replanning call is not a lightweight query: prompts include task specifications, recent histories, failure traces, and multi-agent coordination summaries. As the context grows, transformer planners become slow and bursty; in a closed loop, slow calls delay execution and can trigger more replanning.

In Meta Habitat navigation with shortest-path-noise execution and a replanning SLO of 2500 ms, No BRACE violates the SLO on 85.5% of replanning calls despite 100.0% success. With E-RECAP on the replanning call path, the violation rate drops to 4.7% without reducing success.

Figure 3. Qualitative motivating example on Meta Habitat. As replanning history accumulates, the context grows and produces tail-latency spikes, motivating budgeted context compression.

Meta Habitat motivating example

Composable Efficiency Modules: E-RECAP

E-RECAP token pruning module overview
Figure 2. E-RECAP module. A lightweight predictor scores context tokens using intermediate hidden states, and pruning is applied progressively at selected layers.

E-RECAP is a context-compression module for long-horizon replanning prompts and serves as one concrete efficiency module inside BRACE. Its role is to demonstrate that the framework can plug in call-path modules and make their real-time effect measurable.

The module predicts token utility from intermediate hidden states, preserves fixed head and tail tokens, and fills the remaining budget with top-scoring context tokens so each replanning call satisfies the planner-input budget.

  • Drop-in: operates on the replanning context before the planner call.
  • Composable: represents the same interface that can host compression, retrieval, caching, or future efficiency modules.
  • Auditable: overhead and latency impact are logged as separate call-path phases.

Figures

These figures mirror the paper's visual evidence: distribution-level SLO behavior, qualitative platform comparisons, a focused physical robot example, and diagnostic views of the cost-stability tradeoff.

Cross-platform SLO violation rates

(a) Cross-platform SLO violation.

Tail latency CDF with SLO threshold

(b) Meta Habitat tail latency CDF.

Coordination wait diagnostic on RoboFactory

(c) RoboFactory coordination wait.

Token-latency diagnostic on Meta Habitat

(d) Meta Habitat token-latency tradeoff.

Figure 4. Quantitative summary of replanning stability and cost. (a) Cross-platform SLO violations, (b) Meta Habitat tail-latency CDF, (c) RoboFactory coordination wait, and (d) Meta Habitat token-latency tradeoff.
RoboFactory baseline qualitative example
Figure 5a. RoboFactory TakePhoto baseline.
RoboFactory BRACE qualitative example
Figure 5b. RoboFactory TakePhoto with BRACE.
AirSimNH qualitative comparison storyboard
Figure 6. Qualitative comparison on AirSimNH. The baseline accumulates delayed replanning decisions within the same trigger window, whereas BRACE shortens the effective replanning path and completes the interaction with fewer deadline misses.
Real robot PickFruit rollout pair
Figure 7. Real-robot PickFruit rollout pair on a banana instance. BRACE replans after an intermediate execution failure and recovers, while the underlying LLM+VLA stack fails to recover from a comparable failure.

Results

Following the paper, the tables below keep the main evidence in full rather than only summarizing it as prose. The central pattern is that task success can saturate while replanning repeatedly misses real-time deadlines, so BRACE reports tail latency and SLO violations at replanning-call granularity.

Table 1. Cross-platform summary of replanning cost and stability. Tokens denotes the mean number of tokens passed into the replanning call after pruning/budgeting; tail latency is P95 replanning latency; SLO violation is the fraction of replanning calls exceeding the per-domain latency SLO.
PlatformScenarioMethodEpSuccessTokensLat P95SLOSLO viol.
Meta HabitatNavigationNo BRACE30100.0%2352,677 ms2,500 ms85.5%
Meta HabitatNavigationBRACE + E-RECAP30100.0%202,500 ms2,500 ms4.7%
RoboFactoryPass-Shoe manipulationNo BRACE10100.0%1,5661,604 ms250 ms100.0%
RoboFactoryPass-Shoe manipulationBRACE + E-RECAP10100.0%3191,213 ms250 ms50.0%
Microsoft AirSimK=8 intersectionNo BRACE10100.0%2,9348,520 ms2,500 ms100.0%
Microsoft AirSimK=8 intersectionBRACE + E-RECAP10100.0%1,1141,640 ms2,500 ms4.7%
Table 2. E-RECAP replanning acceleration on Habitat-Lab (MP3D) PointNav at keep ratio r=0.7.
TaskKMethodKeep RatioSuccessSPLTokens/ReplanLatencySpeedupToken Reduction
PointNav1No-Pruning1.00.850.722,8472.34 s1.00x0%
PointNav1Random0.70.780.658231.12 s2.09x71.1%
PointNav1E-RECAP0.70.840.718231.12 s2.09x71.1%
PointNav4No-Pruning1.00.840.7112,8479.87 s1.00x0%
PointNav4Random0.70.650.573,4214.23 s2.33x73.4%
PointNav4E-RECAP0.70.830.703,4214.23 s2.33x73.4%
PointNav8No-Pruning1.00.800.6738,92429.67 s1.00x0%
PointNav8Random0.70.650.589,23411.23 s2.64x76.3%
PointNav8E-RECAP0.70.790.669,23411.23 s2.64x76.3%
Table 3. Open-loop and harder-setting comparisons. Habitat contrasts no-replanning with budgeted repeated replanning; RoboFactory contrasts open-loop, frozen-plan, and BRACE-based recovery under the harder Pass-Shoe setting.
DomainMethodTask metricCost metricLat P95SLO viol.
HabitatNo-initial-plan53.3% / 0.519 SPL0 replansN/AN/A
HabitatNo BRACE53.3% / 0.515 SPL4.533 replans/ep5,492 ms93.4%
HabitatBRACE + E-RECAP53.3% / 0.519 SPL4.533 replans/ep2,486 ms0.0%
RoboFactoryOpen-loop0.0% success0 waitN/AN/A
RoboFactoryFrozen plan0.0% success55.6 ms wait72.8 ms0.0%
RoboFactoryNo BRACE0.0% success6,607.3 ms wait312.7 ms27.6%
RoboFactoryBRACE + E-RECAP80.0% success6,241.7 ms wait247.2 ms4.6%
Table 4. Focused single-arm real-robot results on PickFruit and PushT. The summary indicates that the same budgeting and replanning interface remains effective under physical execution.
TaskMethodSuccess RateReplans/EpP95 ReplanSLO Viol.
PickFruitOne-Shot8.0%0.0N/AN/A
PickFruitNo BRACE24.0%3.429.4 s42.7%
PickFruitBRACE + E-RECAP40.0%1.922.8 s18.6%
PushTOne-Shot0.0%0.0N/AN/A
PushTNo BRACE12.0%3.834.7 s61.3%
PushTBRACE + E-RECAP32.0%2.226.1 s27.5%

Reproducibility

Start from the repository root:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# smoke entrypoints (require local domain dependencies)
scripts/smoke_all.sh

See docs/README.md for environment variables, run scripts, and postprocessing into markdown tables.