TL;DR

Chinese developer Zhipu AI (Z.ai) says it trained a new multimodal model, GLM-Image, using only Huawei hardware including Ascend AI chips and Kunpeng CPUs. The company shared architecture and model-size details but did not disclose how many servers, accelerators or how long training took.

What happened

Zhipu AI announced GLM-Image, a multimodal model it describes as using an "autoregressive + diffusion decoder" hybrid to jointly generate text and images. The firm said the full pipeline — from data preprocessing to large-scale training — ran on Huawei’s Ascend Atlas 800T A2 servers, which combine Kunpeng 920 CPUs (offering variants with 64 or 48 cores) and Ascend 910 AI accelerators. Zhipu published architecture notes on Hugging Face: a 9-billion-parameter autoregressive generator initialized from GLM-4-9B-0414 that produces a compact visual encoding before expansion, plus a 7-billion-parameter diffusion-style decoder with a Glyph Encoder to improve text rendering in images. Zhipu also noted GLM-Image is open source. The company did not reveal the number of machines, accelerator cards or the training time and costs, leaving questions about performance and efficiency unanswered.

Why it matters

  • If reproducible at competitive cost or speed, training without Nvidia/AMD hardware could alter demand dynamics for datacenter GPUs.
  • The claim highlights progress in China’s full-stack AI ambitions, tying compute, CPUs and accelerators from a domestic supplier into a single training pipeline.
  • Open-source release of GLM-Image makes the model immediately available for study and reuse, broadening its technical and geopolitical footprint.
  • Unclear training scale and speed mean the announcement raises questions rather than settles them about Huawei hardware’s competitiveness.

Key facts

  • Zhipu AI (branded Z.ai) announced a model called GLM-Image.
  • Zhipu says it trained the model using Huawei Ascend Atlas 800T A2 servers.
  • Atlas servers cited combine Kunpeng 920 CPUs (48- or 64-core variants) and Ascend 910 AI processors.
  • GLM-Image uses a hybrid 'autoregressive + diffusion decoder' architecture.
  • Autoregressive generator: 9 billion parameters, initialized from GLM-4-9B-0414; produces ~256-token compact encodings expanded to 1K–4K tokens for high-resolution outputs.
  • Diffusion decoder: 7 billion parameters based on a single-stream DiT latent-space decoder, including a Glyph Encoder to improve text rendering in images.
  • Huawei claims its Ascend 910C (2025) can reach about 800 TFLOPS FP16 per card, roughly 80% of Nvidia’s H100 per that vendor’s 2022 launch — this is a vendor claim.
  • GLM-Image is available as open source on model-sharing platforms (Hugging Face).
  • Zhipu did not disclose the number of servers/accelerators used, training duration or cost.

What to watch next

  • Independent benchmarks comparing training throughput, wall-clock time and cost between Huawei Ascend/Kunpeng setups and GPU-based clusters: not confirmed in the source.
  • Disclosure from Zhipu about the scale of the training cluster (number of Atlas servers and Ascend cards) and elapsed training time: not confirmed in the source.
  • Adoption and downstream use of the open-source GLM-Image model in research and products: not confirmed in the source.

Quick glossary

  • Autoregressive model: A model that generates each element of an output sequence conditioned on previously generated elements, commonly used for language generation.
  • Diffusion decoder: A model component that iteratively refines noisy latent representations into high-fidelity outputs, often used in image generation.
  • FP16: A 16-bit floating-point numeric format used in ML computations to speed processing and reduce memory use compared with 32-bit floats.
  • TFLOPS: Tera (trillion) floating-point operations per second, a metric for raw compute throughput of processors.
  • Open source: Software or models released with licenses that allow study, modification and redistribution by others.

Reader FAQ

Did Zhipu really train GLM-Image entirely on Huawei hardware?
Zhipu asserts the complete training pipeline ran on Huawei Ascend Atlas 800T A2 servers combining Kunpeng CPUs and Ascend AI chips.

How many servers or accelerators did Zhipu use and how long did training take?
Not confirmed in the source.

Is GLM-Image available for others to use?
Yes. Zhipu published GLM-Image as open source on model-hosting platforms according to the source.

Does this prove Huawei hardware matches Nvidia H100 performance?
Not confirmed in the source; Huawei provides a claim about Ascend 910C throughput relative to the H100, but Zhipu did not publish comparative training metrics.

AI + ML China's Z.ai claims it trained a model using only Huawei hardware Hasn’t revealed how much kit did the job, so Nvidia can probably rest easy Simon Sharwood…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *