3D Scene Graph Generation for Embodied Agents

Published:

Duration: TBD
Affiliation: VinMotion, VinGroup · Hanoi, Vietnam

Note: Full write-up coming soon.

Overview

Builds structured 3D scene graphs from RGB-D and multi-view perception to provide embodied agents with semantic understanding of their environment — enabling high-level task planning and object-centric reasoning.

Planned Content

  • Scene graph construction from RGB-D streams
  • Object detection, segmentation, and relationship extraction
  • Integration with LLM-based task planners
  • Evaluation on standard benchmarks (ScanScribe, 3DSSG)

Technologies

Python PyTorch Open3D VLM 3D Perception Scene Understanding ROS 2