LLM/VLM-Based Task Planning for Embodied Robots

Published:

Duration: June 2025 – Present
Affiliation: VinMotion, VinGroup · Hanoi, Vietnam

Note: Full write-up coming soon.

Overview

Hierarchical task planning system for humanoid robots using Large Language Models (LLMs) and Vision-Language Models (VLMs) with multi-tool calling. Replaces traditional rule-based planners with LangGraph-orchestrated autonomous behavior pipelines.

System Architecture

  • Perception Layer — VLM-based scene understanding, object detection, and state estimation
  • Planning Layer — LLM task decomposition with LangGraph orchestration and multi-tool calling
  • Execution Layer — ROS 2 action dispatch to manipulation and navigation controllers
  • Evaluation — Benchmarked against baseline planners on household and industrial tasks

Technologies

Python LangGraph LLM VLM VLA ROS 2 PyTorch Tool Calling Humanoid Robotics