LLM/VLM-Based Task Planning for Embodied Robots
Published:
Duration: June 2025 – Present
Affiliation: VinMotion, VinGroup · Hanoi, Vietnam
Note: Full write-up coming soon.
Overview
Hierarchical task planning system for humanoid robots using Large Language Models (LLMs) and Vision-Language Models (VLMs) with multi-tool calling. Replaces traditional rule-based planners with LangGraph-orchestrated autonomous behavior pipelines.
System Architecture
- Perception Layer — VLM-based scene understanding, object detection, and state estimation
- Planning Layer — LLM task decomposition with LangGraph orchestration and multi-tool calling
- Execution Layer — ROS 2 action dispatch to manipulation and navigation controllers
- Evaluation — Benchmarked against baseline planners on household and industrial tasks
Technologies
Python LangGraph LLM VLM VLA ROS 2 PyTorch Tool Calling Humanoid Robotics
