ActiveGrounder: 3D Visual Grounding
with Object-Hull-Guided Active Observation

We present ActiveGrounder, a framework that transforms 3D visual grounding from a passive recognition task into an active exploration paradigm. Unlike existing methods that rely on static maps or single-image perception, ActiveGrounder integrates maps with object-hull-guided navigation to actively acquire informative viewpoints. Through experiments, we demonstrate that ActiveGrounder achieves more accurate and reliable grounding compared with passive baselines, offering a step toward embodied agents capable of active perception and grounding in the real world.
Fig1 - Framework| Success Rate(%) ↑ | Avg. Time (s) † ↓ | Timeout Cases(#/7) | |
|---|---|---|---|
| Baseline | 14.3 | 27.0 | 1 |
| ActiveGrounder (Ours) | 85.7 | 125.9 | 0 |
| † Avg. time excludes timeout cases. |
Simulation Environment
Fig2 - Qualitative evaluationpowered by Academic Project Page Template