This Week: ERNIE 5.1 Tops Search Charts, DeepSeek Record Funding, Google Veo 3 Bridges Audio-Visual Gap
Extracting real signals from the AI noise
It was a busy week. On the Chinese side, Baidu launched ERNIE 5.1 and DeepSeek’s record-breaking funding dominated headlines. Internationally, Google’s Veo 3 pushed AI video to new heights while NVIDIA’s Jim Fan publicly declared VLA architecture “dead.”
Let’s go through the stories worth paying attention to.
Baidu ERNIE 5.1 Released
On May 9, Baidu officially released ERNIE 5.1.
Two highlights stand out. First, the “multi-dimensional elastic pre-training” technology — a single training run produces models at multiple scales, compressing total parameters to about 1/3 and active parameters to about 1/2, with pre-training costs at just 6% of comparable models. That cost control is genuinely impressive.
Second, the results. LMArena’s latest ranking shows ERNIE 5.1 scoring 1223 points, reaching #1 domestically and #4 globally on the search leaderboard — the only Chinese-made model on the list. Agent capabilities improved significantly, exceeding DeepSeek-V4-Pro, creative writing matching Gemini 3.1 Pro, and reasoning approaching top closed-source models.
The preview version released in late April already scored 1476 on the text leaderboard, surpassing GPT-5.5 and DeepSeek-V4-Pro to take China’s #1. The real performance of the official release awaits more detailed benchmark data.
Baidu also previewed its Create 2026 developer conference on May 13-14, where a wave of application-level announcements are expected.
DeepSeek’s $7 Billion Funding
According to QbitAI, DeepSeek completed a record-breaking first round of 50 billion yuan ($7 billion) in funding, with Liang Wenfeng personally contributing 20 billion. The V4.1 version is already scheduled for a June release.
If confirmed, this is likely the largest AI industry funding round this year. Since DeepSeek-R1 made waves in the model community at the start of the year, its low inference cost and solid results have prompted many labs to rethink their training strategies.
American Researcher’s Trip to China
Allen Institute for AI (Ai2) researcher Nathan Lambert recently visited China, spending 36 hours密集-visiting Moonshot AI, Zhipu AI, Tsinghua, Meituan, Xiaomi, and Alibaba’s Qwen team. He wrote a long article about the experience.
His conclusion is interesting: Chinese labs’ ability to catch up isn’t due to any single genius researcher’s flash of inspiration — it’s because every detail across the full stack, from data to architecture to RL algorithms, gets squeezed for incremental gains, and these scattered improvements are assembled into a multi-objective optimization scheme.
He also noted one phrase: “All labs are afraid of ByteDance.” ByteDance’s engineering capabilities and resource investment are genuinely unique in China.
AI Co-Mathematician
This one is cool. Google DeepMind released an asynchronous workspace system called “AI Co-Mathematician” specifically designed to assist mathematical research.
Oxford professor Marc Lackenby used the system to solve problem 21.10 from the Kourovka Notebook — a group theory question that had remained unsolved for decades. The process was interesting too — the AI’s first proof attempt had a flaw, which was caught by a review Agent in the system. Lackenby reviewed it, got inspired by the attempt, figured out how to fill the gap, and ultimately completed the proof in collaboration with the AI.
On the hardest math AI benchmark, FrontierMath Tier 4, the system scored 48%, refreshing the SOTA and surpassing GPT-5.5 Pro’s 39.6%.
AI-assisted mathematical research is moving from concept to practical use. Several Erdős problems have also been solved by GPT in recent months.
Jim Fan: VLA Is Dead
At Sequoia AI Ascent 2026, NVIDIA’s robotics lead Jim Fan spent 20 minutes holding “funerals” for two directions: VLA (Vision-Language-Action models) and teleoperation.
His proposed new paradigm is called the “World Action Model,” which essentially copies the LLM playbook:
- Pre-training to simulate the next world state (corresponds to next token prediction)
- Action fine-tuning to calibrate the valuable parts of real robots (corresponds to supervised fine-tuning)
- Reinforcement learning to optimize strategy (corresponds to RLHF)
NVIDIA has been progressively releasing work including EgoScale, DreamDojo, and Dream Zero. Jim Fan’s speech essentially outlined the direction for embodied AI in 2026.
Google Veo 3
Google released Veo 3 last week, a new video generation model that produces 8-second 720p videos with synchronized sound effects and audio dialogue — the first time Google’s AI tools have achieved this.
Also launched is an online AI filmmaking tool called Flow, which integrates Veo 3 with the Imagen 4 image generator and Gemini language model. You can describe scenes in natural language and manage characters, locations, and visual styles in a web interface.
Ars Technica ran several test groups and found that Veo 3-generated videos have genuinely stepped up in realism of character movements and expressions. People on TikTok are already passing off Veo 3-generated content as real videos for attention.
On pricing, Google AI Ultra subscription ($250/month) includes 12,500 credits, with each Veo 3 video consuming 150 credits — about $1.50 per video.
Hugging Face’s Open-Source Robots
Hugging Face announced two humanoid robot products. One is HopeJR, priced at around $3,000, with 66 degrees of freedom, capable of walking and object manipulation. Co-designed with French robotics company The Robot Studio, and open-source.
The other is Reachy Mini, which looks like a Wall-E-style half-torso sculpture that can turn its head and speak, primarily for AI developers to test with, priced at $250-300.
Hugging Face CEO Clem Delangue was blunt about it: robotics can’t be monopolized by a few big players using black-box systems. Anyone should be able to assemble, understand, and rebuild.
Qwen AI Glasses Upgrade
Qwen rolled out a major upgrade for its AI glasses S1, pioneering spatial 3D display in the industry — navigation prompts and information cards now show depth, turning “flat into 3D.”
New features include proactive reminders (bring an umbrella, stretch your neck, etc.), with ride-hailing, flash delivery, and photo-based Q&A coming this month. The AI glasses race ultimately comes down to large model capabilities.
Other Notable Stories
- Cloudflare publicly admitted AI made 1,100 jobs obsolete, even as company revenue hit new highs. Most companies wouldn’t say this out loud.
- OpenAI launched new voice intelligence features in its API, allowing developers to integrate more natural voice interactions into their apps.
- Chrome’s built-in 4GB AI model — not exactly new technology, but genuinely confusing for users — does it run locally or in the cloud?
- Wired reported on Nick Bostrom’s plans for humanity’s “great retirement” — characteristically controversial views.
- Thousands of “vibe-coded” apps exposed corporate and personal data — the side effects of AI-assisted programming are starting to show.
- Anthropic raised Claude Code usage limits, reportedly tied to a new partnership with SpaceX.
One-Sentence Summary
China is competing on cost and results, while the West pushes on video, robotics, and embodied AI. The directions differ, but both are accelerating. Create 2026 this month and DeepSeek V4.1 in June are both worth watching.