China is now pushing its domestic AI industry to reduce reliance on NVIDIA’s CUDA platform, as officials look for long-term alternatives that give more control over both hardware and software development. The shift comes as CUDA continues to dominate the AI ecosystem due to its mature tooling and deep integration with NVIDIA GPUs, which keeps developers tied to one hardware stack and limits flexibility for competing solutions.
Wei Shaojun, an executive at the China Semiconductor Industry Association, has called for a different approach that moves away from building a direct CUDA replacement and instead focuses on a new computing model built around software-defined chips.
“Even if our own technology is not good enough at the start, it must still be used. Trial and error may not succeed, but without trying, we will certainly fall behind.” – Wei Shaojun
What software-defined chips change
Software-defined chips change how computation gets handled at a fundamental level, as developers no longer depend on a fixed hardware instruction set or a layer like CUDA to run workloads. Instead, the compiler generates a configuration bitstream that directly maps how data moves across the chip, which gives more control over execution and reduces dependency on predefined architectures.
This design shifts the focus from hardware scheduling to deterministic compilation, where every operation gets planned in advance down to precise timing, allowing developers to fine-tune performance in ways traditional GPUs do not allow.
Why China wants this shift
China sees the cost of building a full CUDA-like ecosystem as too high, especially when the existing ecosystem already locks developers into NVIDIA hardware, which makes competition difficult at both the software and hardware levels. Wei Shaojun argues that chasing CUDA replication wastes resources, while software-defined chips offer a cleaner path that avoids those constraints.
At the same time, this approach brings serious engineering challenges, since compiler design becomes far more complex and must handle routing, branching, and structural changes that traditional chip design avoids. Existing examples like SambaNova’s RDUs and Groq’s LPUs show that software-defined architectures can work, but they still focus on specific workloads rather than replacing GPUs entirely.