3D Scene Graph Generation for Embodied Agents
Published:
Duration: TBD
Affiliation: VinMotion, VinGroup · Hanoi, Vietnam
Note: Full write-up coming soon.
Overview
Builds structured 3D scene graphs from RGB-D and multi-view perception to provide embodied agents with semantic understanding of their environment — enabling high-level task planning and object-centric reasoning.
Planned Content
- Scene graph construction from RGB-D streams
- Object detection, segmentation, and relationship extraction
- Integration with LLM-based task planners
- Evaluation on standard benchmarks (ScanScribe, 3DSSG)
Technologies
Python PyTorch Open3D VLM 3D Perception Scene Understanding ROS 2
