Robots Can Dance—but They Still Can’t Pick Up a Set of Keys


A Conversation with MCT Chairman and CEO Li Ke
After one million vehicles and five billion kilometres of validation, Li Ke has set out to lay the first foundation for Physical AI.

“What I cannot create, I do not understand.”
— Richard Feynman, 1988
It was January 2026, the opening day of CES.
The embodied intelligence exhibition area at the Las Vegas Convention Center was packed. Humanoid robots from around the world took to the stage one after another, moving with a fluidity that left the audience holding its breath.
What few people noticed was that many of these robots were powered by chips originally designed for consumer electronics, sensors borrowed from automotive supply chains, and simulation platforms adapted from autonomous driving.
They had been assembled from technologies designed for someone else’s world.
A humanoid robot costing RMB 500,000 could perform a backflip and dance to the viral Chinese routine Kemusan. Yet it could not pick up a set of keys from the floor.
The image that stayed with Li Ke was very different.
If the underlying chips and foundation models could become capable enough, a robot that could pick up a dropped pill for an elderly person living alone might one day enter an ordinary household.
Between those two images lies a chasm.
That chasm is called infrastructure.
This is no punchline. It is a collective predicament now confronting the Physical AI industry. As the world talks about embodied intelligence reshaping human civilisation, a counterintuitive reality is becoming increasingly difficult to ignore: an industry with the potential to reach trillions of yuan still lacks infrastructure designed natively for its own needs.
The result is a generation of robots that dazzle on stage but struggle in everyday life.
“Physical AI needs an NVIDIA of its own. Today, that company does not yet exist,” Li Ke, chairman and CEO of MCT, told 36Kr.
Founded three years ago, MCT has independently developed automotive-grade high-precision positioning chips, IMU modules and multi-sensor fusion algorithms. Its technologies have been deployed by leading automakers, with cumulative shipments exceeding one million units and total verified mileage surpassing five billion kilometres.
These are not laboratory figures. They represent five billion kilometres accumulated by one million vehicles operating through urban canyons, tunnels and complex road conditions.
According to Frost & Sullivan, the global embodied intelligence robot market reached RMB 5.4 billion in 2025 and is expected to exceed RMB 200 billion by 2030.
On the eve of an industry-wide breakout, Li has reached a clear conclusion: the window for building infrastructure native to Physical AI has arrived.
But first, a more fundamental question must be answered.

The Physical World Is Not a Projection of the Digital World
Any discussion of what Physical AI requires must begin with a first-principles question:
What fundamentally distinguishes the physical world from the digital world?
- The digital world is a world of symbols. Its basic unit is the bit, its operating mechanism is logic, and its boundaries are defined by humans. Language models learn the statistical structure of text: the relationships between words, and the logical connections between concepts. What they master is the symbolic system through which humans describe the world.
- The physical world is a world of states. It is governed by matter, energy and causality, with boundaries set by the laws of physics. How light falls on a surface, how an object responds to force, or what an object looks like from an angle never previously captured by a camera cannot be derived from a symbolic system alone. These things can only be learned from the physical world itself.
This distinction determines why AI infrastructure for the physical world cannot simply be transplanted from the methodologies of the digital world.
Embodied intelligence is not truly “reusing” automotive technology. It is making do with it. An industry built on borrowed foundations cannot support a trillion-yuan transformation of the physical world.
Li’s answer is a five-layer architecture.
In his view, these five layers are not a collection of technologies stacked on top of one another. They form an organic whole. The absence of any single layer can cause the entire system to fail.
- The perception layer is where physical interaction begins. A robot operates across three-dimensional space, in an environment that can be more than a hundred times more complex than intelligent driving. Deep fusion across vision, tactile sensing, pose, positioning, force control and other forms of sensory input is therefore essential. Once perception becomes distorted, the world model built upon it begins to collapse.
- The computing layer requires native chips. General-purpose GPUs and repurposed automotive SoCs are clearly insufficient. Embodied robots require chips designed specifically for physical interaction: chips that combine high performance, low power consumption and low-latency real-time processing, while supporting multi-sensor fusion.
- The execution layer depends on the seamless transfer of algorithms from simulation into the real world. “A robot interacting with the physical world must obey every law of physics,” Li said. “A cup can break. Milk can spill. The system has to know what to do next.”
- At the data and simulation layer, no simulator can fully capture reality, particularly the unpredictable edge cases that cannot be anticipated in advance. The central requirement at this layer is a continuously updated mechanism through which data from real physical environments flows back into simulation and corrects the models in real time. This iterative loop of perception, training and feedback must narrow the gap between simulation and reality, enabling AI systems to develop genuine robustness in the face of physical complexity.
- The safety layer runs through the entire system. Automotive-grade standards are only the starting point. A failure in either software or hardware may cause personal injury or substantial losses in industrial settings. The tolerance for error is close to zero.
All five layers are indispensable.
“Intelligent driving needs to see as far ahead as possible, so that the vehicle can prepare in advance,” Li said. “A robot needs to perceive what is directly in front of it and interact with objects within reach. These are two fundamentally different philosophies of physical interaction.”
This five-layer architecture is not an isolated thesis developed by MCT.
Across both industrial practice and academic research, a growing number of independent lines of thought are converging on the same conclusion: for Physical AI, perception, computing, data, simulation, models and execution form an indivisible organic system.
Remove any one of them, and the system can no longer truly understand the physical world.

The Real Beginning Was an Anomaly at –30°C
Three years ago, MCT did not rush into embodied intelligence.
Instead, the company spent three years refining its methodology in intelligent driving, turning it into products that could withstand validation at scale.
That methodology is described internally as “data-driven, with software and hardware developed as an integrated system.”
Yet what turned the methodology into organisational muscle memory was not an ordinary success. It was a deeply memorable failure.
In early 2024, an MCT IMU module exhibited intermittent positioning drift during winter testing by a leading customer.
The team examined the hardware design and algorithm parameters, but two weeks of investigation failed to identify the root cause.
They eventually traced the issue to an obscure physical variable. At –30°C in northeastern China, the temperature-drift curve of one component deviated slightly from the calibration data provided by the supplier.
It was the kind of discrepancy that would never have surfaced in a conventional laboratory test.
“For that entire month, we entered the meeting room at nine in the morning and left at two the next morning,” Li recalled. “We repeatedly fed real-world test data back into the simulation platform and ran the error models one by one.”
“In the end, we did more than solve a problem. That project forced us to rebuild our entire calibration and validation process. Today, one of MCT’s testing requirements is that every IMU module must complete a full temperature cycle from –40°C to 85°C. That standard grew directly out of the lesson we learned.”
This is more than the story of a technical fault being corrected.
It is the story of a methodology being validated: only real-world data can calibrate a system. A simulator can never calculate a deviation that was never written into the supplier’s specification.
That mindset is what allowed MCT’s products to survive the unforgiving automotive-grade market.
Today, MCT’s self-developed MOJANDA series of automotive-grade high-precision positioning chips delivers positioning accuracy approximately 30% higher than comparable products from international leader u-blox in severely obstructed environments, while reducing time to first fix to the order of seconds.
The SUMACO series of IMU modules achieves automotive-grade, industry-leading bias stability across the full operating temperature range.
Together, the two product lines have replaced overseas suppliers in multiple applications, with cumulative shipments exceeding one million units.
Yet in Li’s view, MCT’s deepest moat is not shipment volume. It is the methodology accumulated through close co-development with leading customers.
“Paradoxically, the more resources a company has, the harder this kind of deep vertical integration can become,” Li explained.
Infrastructure native to Physical AI requires chips, hardware and models to be co-designed as a single closed-loop system. Large organisations naturally gravitate towards modularity: different departments own different layers, interfaces become standardised, and each unit optimises its own module.
That approach has worked in digital AI. In Physical AI, however, it can create fatal gaps at the system level.
“From our first day, we treated these three things as one,” Li said. “That is an advantage that only time can create.”
The deeper barrier is cognitive.
MCT has worked alongside leading companies throughout the evolution from rule-based systems to end-to-end architectures, vision-language-action models and world models.
“Our customers have invested enormous resources in AI, including deep work on chips, operating systems and compiler toolchains. We have experienced and co-created that evolution alongside them. That kind of understanding cannot be bought.”
One line from Li is repeated frequently inside MCT:
“Customers are not buying our chips. They are buying the hard-won lessons from every detour we have taken alongside them.”

Every Table Could One Day Become an Intelligent Agent
Building infrastructure for an industry requires identifying the largest common denominator across a wide range of real projects.
Humanoid robots, quadruped robotic dogs, robotic lawnmowers and low-altitude aircraft may take very different forms. Yet their underlying requirements for perception, computing, data and simulation are fundamentally similar.
Water and electricity became infrastructure because nearly everyone needs them, while the cost of providing them independently is prohibitively high.
The foundational capabilities required by Physical AI share the same structural characteristics.
“These common data requirements are where infrastructure begins,” Li said. “We will not build robot bodies or specific end applications. We want to provide the industry’s basic utilities—the equivalent of water and electricity.”
He outlined three priorities.
First, build the data foundation.
Physical AI faces an acute shortage of high-quality data.
MCT plans to launch products including data-collection gloves in the near future, capturing high-precision hand movements, full-body pose, visual information and other forms of physical interaction data.
“If embodied intelligence is to replace human labour, it must first learn how humans move,” Li said.
These products will provide the raw material for chip design and software platforms, while also being commercialised as standalone offerings. The ultimate objective is to create a positive flywheel connecting data, models and chips.
Second, develop chips native to Physical AI.
Based on the physical-interaction data it collects and its deeper understanding of robotic scenarios, MCT plans to develop Physical AI-native perception and computing chips purpose-built for physical interaction.
Unlike automotive-grade chips, robotic chips must support distributed multi-sensor deployment, process flexible signals such as force and tactile information, and deliver both low power consumption and high-precision synchronisation.
“These requirements simply did not exist in previous automotive applications,” Li said.
Third, build platforms and an ecosystem.
MCT plans to expand its business model beyond hardware and algorithmic solutions into areas such as simulation-platform subscriptions and data services.
The company will begin with project-based co-development, working with leading customers to iterate on technologies and products. Once the technical direction begins to converge, it will move towards standardised products and ecosystem development.
“When the Physical AI era truly arrives, every table and every teacup could become an intelligent agent,” Li said. “We hope the perception and computing capabilities beneath them will come from MCT.”
When might that vision become reality?
Li offers a clear prediction.
“Before 2030, Physical AI will complete the transition from ‘usable’ to ‘reliable’. Two curves will rise in a reinforcing spiral: world-model capabilities driven by massive volumes of egocentric data, and large-scale robotic computation graphs generated and orchestrated by digital AI agents, an approach we refer to as GAP. The inflection point will arrive when those two curves genuinely converge.”
On the other side of that inflection point lies the possibility of Physical AI becoming something built for everyone.
It returns us to the image at the beginning of this story: not a robot worth hundreds of thousands of yuan standing on an exhibition stage, but one capable of entering an ordinary home.
When asked how he would know that the mission had succeeded, Li gave a simple answer:
“One day, when people talk about Physical AI infrastructure, they will think of MCT first—and then everyone else.”
Every infrastructure era begins with a moment when the technology already exists, but no one yet understands what it will change.
Electricity began that way. So did the internet. So did the GPU.
In 1988, the phrase was found on Feynman’s blackboard at the California Institute of Technology. By then, he was already one of the architects of quantum electrodynamics and the originator of the path-integral formulation.
He had spent a lifetime demonstrating that he understood physics at its deepest level. Yet he chose those words as a reminder that the ultimate test of understanding is the ability to create something anew.
Physical AI infrastructure demands precisely this kind of creation from first principles.
It cannot simply be transplanted from any existing field. It must be built and defined from the ground up.
And beneath that foundation lies an annotation written by one million vehicles over five billion kilometres.
While the market continues to chase embodied robot platforms, MCT has chosen to build the part of the iceberg that remains below the waterline.
That choice implies a longer development cycle. It also means that, once Physical AI enters its breakout phase, MCT’s closed-loop integration of chips, hardware and models may become a foundational layer that every industry participant must engage with.
The reliability of that foundation will not merely be written in software code.
It will be built into silicon and proven, kilometre by kilometre, on real roads.