China VLA Large Model Applications in Automotive & Robotics, 2025

VLA Large Model Applications in Automotive and Robotics: Insights from the 2025 Research Report

The VLA Large Model Applications in Automotive and Robotics Research Report, 2025 has recently been added to ResearchAndMarkets.com’s portfolio, offering a comprehensive overview of Vision-Language-Action (VLA) large models and their applications in both automotive intelligent driving and robotics. This report systematically analyzes the technical origins, development stages, implementation solutions, and core characteristics of VLA models, providing industry stakeholders with a detailed reference on this emerging field.

Overview of VLA Large Models

VLA, short for Vision-Language-Action models, has rapidly become a critical technology in autonomous driving and robotics. In 2024, the buzzword in intelligent driving was “end-to-end,” reflecting the industry’s focus on seamless integration of perception, decision-making, and control. In 2025, however, “VLA” has emerged as the defining concept, with automotive and robotics companies quickly integrating these models into practical applications.

The report categorizes VLA development into four key stages:

  1. Pre-VLA (VLM as Explainer): Vision-Language Models (VLM) are used primarily for interpretability rather than direct action.
  2. Modular VLA: Separate modules handle perception, language understanding, and action execution, offering flexibility but limited integration.
  3. End-to-End VLA: Full integration of visual, language, and action modules for streamlined performance.
  4. Augmented VLA: Enhanced VLA models incorporate external knowledge, multimodal fusion, and adaptive learning strategies to improve performance and generalization.

A recent collaborative survey by McGill University, Tsinghua University, Xiaomi Corporation, and the University of Wisconsin-Madison highlights these stages in the context of autonomous driving, illustrating the evolution from modular experimentation to fully integrated, augmented VLA systems.

Implementation Solutions and Technical Architecture

The report identifies eight typical VLA implementation solutions and more than forty large model frameworks used in automotive and robotics applications. Key technical approaches include:

  • Classic Transformer-Based Models: Leveraging attention mechanisms for multimodal integration.
  • Pre-trained LLM/VLM-Based Solutions: Utilizing large language or vision-language models for perception and reasoning.
  • Diffusion Models and LLM + Diffusion Hybrids: Enabling scenario generation and prediction.
  • Video Generation + Inverse Kinematics Solutions: Supporting precise action planning in dynamic environments.
  • End-to-End Explicit, Implicit, and Hierarchical Solutions: Providing varying degrees of integration between perception, reasoning, and actuation.

Within autonomous driving, VLA solutions are already being deployed by companies such as Li Auto, XPeng Motors, Chery Automobile, Geely Automobile, Xiaomi Auto, DeepRoute.ai, Baidu Apollo, Horizon Robotics, SenseTime, NVIDIA, and iMotion. These implementations focus on enhancing generalization, reducing computational overhead, and improving decision-making through multimodal perception and large-scale data training.

VLA Applications in Robotics

While VLA adoption in automotive applications has been characterized by massive model sizes—tens of billions of parameters and nearly 1,000 TOPS of computing power—the robotics field is still exploring optimal configurations. Robotic VLA models often operate with smaller datasets, ranging from 1 to 3 million parameters, and may mix real-world data with synthetic simulations.

The difference in scale is influenced by two main factors:

  1. Population: Hundreds of millions of cars exist on roads worldwide, providing extensive training data, whereas deployed robots remain limited in number.
  2. Complexity: Robots focus on detailed, microcosmic interactions, requiring more intricate multimodal perception and fine-grained action planning.

Currently, over 100 VLA models and associated datasets exist in robotics, with research teams experimenting with multiple approaches. Three notable explorations include:

1. VTLA Framework Integrating Tactile Perception:
In May 2025, research teams from the Institute of Automation of the Chinese Academy of Sciences, Samsung Beijing Research Institute, Beijing Academy of Artificial Intelligence (BAAI), and the University of Wisconsin-Madison released a VTLA model for insertion manipulation tasks. This model integrates visual, tactile, and language inputs, supplemented with a temporal enhancement module and preference learning strategy. VTLA demonstrated superior performance compared to traditional imitation learning and single-modal approaches in high-precision, contact-intensive tasks.

2. Multi-Robot Collaborative VLA Models:
In February 2025, Figure AI launched the Helix general Embodied AI model, designed for collaborative operation on humanoid robots. In demonstrations, two robots efficiently cooperated to handle fruits, illustrating the potential of VLA models for long-term shared tasks. Helix can operate entirely on embedded low-power GPUs, enabling immediate commercial deployment.

3. Offline On-Device VLA Models:
In June 2025, Google unveiled Gemini Robotics On-Device, a multimodal VLA model capable of running locally on robots without internet connectivity. Gemini Robotics On-Device can process visual input, language instructions, and action outputs while supporting developer fine-tuning for custom applications.

Integration of VLA in Automotive Factories

Robotic VLA models are increasingly deployed in automobile factories, merging the macro world models of vehicles with the micro world models of robots—heralding the era of Embodied AI. Companies such as Tesla, XPeng, and Xiaomi have leveraged their automotive expertise in vision systems, sensors, and AI chips to develop robots like Tesla Optimus, XPeng IRON, and Xiaomi CyberOne. XPeng IRON, for example, integrates the AI Hawkeye vision system, end-to-end large model, Tianji AIOS, and Turing AI chip for industrial applications.

Currently, humanoid robots are primarily used in manufacturing environments, performing tasks such as assembly, logistics, equipment inspection, and factory maintenance. Tesla Optimus robots operate in Tesla’s battery workshops, while Apptronik and Apollo robots collaborate with Mercedes-Benz on assembly lines. UBTECH’s Walker S2 humanoid robot recently achieved fully autonomous, hot-swappable battery replacement in just three minutes.

Public reports indicate that leading automakers, including Tesla, BMW, Mercedes-Benz, BYD, Geely Zeekr, Dongfeng Liuzhou Motor, Audi FAW, FAW Hongqi, SAIC-GM, NIO, XPeng, Xiaomi, and BAIC Off-Road Vehicle, have deployed humanoid robots across various factory operations. VLA-powered robots from Figure AI, Apptronik, UBTECH, AI Robotics, and Leju are now integral to production, logistics, inspection, and maintenance processes—laying the foundation for future unmanned factories.

Key Findings and Trends

The report outlines several key trends in VLA development:

  • Integration of Multimodal Inputs: Visual, language, tactile, and temporal data fusion is central to improving robot performance.
  • Scalable Collaboration: Multi-robot systems demonstrate that collaborative operation can enhance efficiency and task completion.
  • Offline On-Device Capabilities: Running VLA models locally improves reliability and adaptability in industrial settings.
  • Rapid Industrial Adoption: Automotive manufacturers are early adopters, leveraging factory environments as primary application scenarios for VLA robots.

Chapter Summary

Chapter 1 – Overview of VLA Large Models: Defines VLA, traces its evolution, outlines four stages of development, discusses core characteristics, and identifies technical challenges.

Chapter 2 – VLA Technical Architecture, Solutions, and Trends: Analyzes core technical architectures, decision cores, transformer-based, pre-trained, diffusion, and end-to-end solutions, and explores trends in autonomous driving and embodied AI.

Chapter 3 – VLA Applications in Automotive: Details implementations by Li Auto, XPeng Motors, Chery Automobile, Geely, Xiaomi Auto, DeepRoute.ai, Baidu Apollo, Horizon Robotics, SenseTime, NVIDIA, and iMotion.

Chapter 4 – Robotics Large Model Progress: Covers contributions from BAAI, SenseTime, Manycore Tech, Peking University, Renmin University, AgiBot, Unitree, Shanghai Jiao Tong University, Figure AI, OpenAI, Galbot, Google, and others.

Chapter 5 – Robotics VLA Application Cases: Highlights real-world applications by AgiBot, Galbot, Robot Era, Estun, Unitree, UBTECH, Tesla Optimus, Figure AI, Apptronik, Google Gemini Robotics, Agility Robotics, XPeng IRON, Xiaomi CyberOne, GAC GoMate, Chery Mornine, Leju Robotics, LimX Dynamics, AI Robotics, and Meituan.

Source Link