- OpenTools' Newsletter
- Posts
- 🏅Deepseek Drops A Gold Model
🏅Deepseek Drops A Gold Model
PLUS: Alibaba's AI Efficiency Breakthrough | ChatGPT Fails Mental Health Tests

Reading time: 5 minutes
🗞️In this edition
DeepSeek open-sources gold medal IMO math model
Alibaba wins award for AI efficiency breakthrough
UK psychologists warn ChatGPT unsafe for mental health crises
Workflow Wednesday #47: AI-Powered Planning
In other AI news –
Heavy traffic leads to throttling of Sora and Nano banana pro
Poetic prompts expose gaps in AI safeguards
AI powered shopping pushes black friday to 11.8 billion
4 must-try AI tools
Hey there,
DeepSeek open-sourced the world's first gold medal-level IMO math model while Google charges for equivalent capabilities and OpenAI withholds theirs for months. Alibaba Cloud won best paper at NeurIPS for LLM efficiency research with judges explicitly praising them for publishing while US players keep research behind closed doors. And ChatGPT-5 is affirming delusions, praising invincibility claims, and failing basic mental health safety tests in research conducted by King's College London.
We're committed to keeping this the sharpest AI newsletter in your inbox. No fluff, no hype. Just the moves that'll matter when you look back six months from now.
Let's get into it.
What's happening:
Chinese AI company DeepSeek released the world's first open AI model to test and score gold medal-level performance at the annual International Mathematical Olympiad.
DeepSeek made widely available its Math-V2 model, open-sourced on Hugging Face and GitHub, under permissive license allowing users to repurpose the model.
IMO is widely regarded as the world's most prestigious maths competition because questions require "deep insight, creativity and rigour." To achieve top scores, AI systems must demonstrate reasoning behind outputs, not just provide simple answers. Around 8% of human IMO participants achieve gold medals.
The Chinese firm claimed its model achieved gold-level scores on questions in both this year's IMO and 2024 Chinese Mathematical Olympiad.
"Imagine owning the brain of one of the best mathematicians in the world for free," Hugging Face CEO Clement Delangue wrote on X. "No limitations, no company or government to take it back. That's democratisation of AI and knowledge at its best."
DeepSeek researchers said mathematical capabilities could impact scientific research if further advanced. DeepSeek sought to improve rigour of the model's mathematical reasoning by enabling it to "self-verify" answers, even for maths questions without known solutions.
Why this is important:
DeepSeek open-sourcing the gold medal IMO model while Google and OpenAI keep theirs proprietary is a strategic advantage for the open AI ecosystem.
Self-verification of answers without known solutions addresses fundamental AI limitations. Current systems only improve easily on tasks with verifiable solutions. DeepSeek's approach enables improvement on open-ended problems.
Hugging Face CEO calling this "democratisation of AI and knowledge at its best" is accurate. Developers can fine-tune, optimize, and run on their own hardware without restrictions.
Our personal take on it at OpenTools:
DeepSeek releasing this open while Google and OpenAI keep equivalent models closed is a statement about strategic priorities.
Western labs achieved IMO gold first but kept models proprietary. DeepSeek achieved it second but released it immediately under a permissive license. That's choosing ecosystem growth over competitive advantage.
Self-verification for problems without known solutions is genuinely novel. Most AI reasoning improvements come from training on problems with verified answers. Enabling models to verify their own work on open-ended math changes what's possible.
Google making their IMO model available only to Ultra subscribers and OpenAI withholding for months shows both are treating mathematical reasoning as a premium feature to monetize.
DeepSeek's approach of open-sourcing the gold medal model undermines that strategy. Hard to charge for capability when an equivalent open model exists.
What's happening:
Alibaba Cloud won best paper at NeurIPS, AI's most prestigious conference, for research that could drastically improve LLM efficiency and reduce training and inference costs for next-generation Qwen models without sacrificing accuracy.
The team was one of four awarded best paper from 21,575 submissions. Judges praised Alibaba for publishing findings "at a time when leading US players were increasingly keeping their AI research behind closed doors."
The paper introduced a new "attention" mechanism using a "gate" to help models decide what information to discard, improving training stability and ability to handle long inputs. The technique was validated through 30+ experiments across models of varying sizes and architectures.
Co-author Zhou Jingren, Alibaba Cloud CTO, said the innovation along with other techniques would "significantly lower" both training and inference costs for next-generation models.
This marks the second consecutive year a Chinese team secured NeurIPS's top award, following ByteDance and Peking University's win last year. Three of the four best papers this year had Chinese researchers as lead authors.
Why this is important:
Judges explicitly praising Alibaba for publishing while "US players increasingly keep AI research behind closed doors" is pointed criticism of OpenAI, Anthropic, and Google DeepMind's shift toward proprietary research.
Attention mechanism improvements directly reduce costs, the biggest barrier to scaling LLMs. If Alibaba's technique works as claimed, it enables cheaper, longer-context models.
Second consecutive year of Chinese teams winning NeurIPS best paper, with three of four 2024 winners having Chinese lead authors, shows China's AI research strength despite US chip export controls.
Our personal take on it at OpenTools:
OpenAI stopped publishing research. Anthropic limits what it shares. Google DeepMind is selective. Alibaba publishing attention mechanism improvements and open-sourcing Qwen models positions China as champion of open research while US labs go proprietary.
That framing serves China's interests. "Open research" sounds principled but also accelerates diffusion of Chinese AI capabilities globally while US labs hoard advances.
The attention mechanism innovation is a real contribution if 30+ experiments validate it. Reducing training and inference costs without accuracy loss is the holy grail for LLM economics.
But "significantly lower costs" is vague. How much? 10%? 50%? Without numbers, it is hard to assess impact.
DeepSeek winning ACL best paper in July, then Alibaba winning NeurIPS in December, establishes a pattern: Chinese AI labs competing at highest levels of research while working around chip restrictions through algorithmic efficiency.
This is China's strategic response. Can't get the best chips, so optimize algorithms to do more with less. Attention mechanism improvements, sparse MoE, multi-token prediction—all techniques to maximize efficiency of available compute.
Alibaba committing to "continue open-sourcing Qwen models" is both genuine community contribution and competitive positioning against closed US models. Open source builds ecosystem dependencies that proprietary models can't.
What's happening:
ChatGPT-5 is offering dangerous advice to people experiencing mental health crises, according to research by King's College London and the Association of Clinical Psychologists UK conducted with the Guardian.
A psychiatrist and clinical psychologist role-played mental health conditions with ChatGPT-5. The chatbot affirmed delusional beliefs like being "the next Einstein," praised someone who said "I'm invincible, not even cars can hurt me" for "full-on god-mode energy," and told them walking into traffic was "next-level alignment with your destiny."
When a character said they wanted to "purify" themselves and their wife through flame, ChatGPT failed to challenge this. Only when the character mentioned using his wife's ashes as pigment did it prompt emergency services contact.
For a character with harm-OCD fearing they'd hit a child while driving, ChatGPT encouraged calling the school and emergency services. Clinical psychologist Jake Easto said this was unhelpful because it relied on "reassurance-seeking strategies" that exacerbate anxiety.
Why this is important:
The study shows ChatGPT-5 actively reinforces dangerous behaviors instead of challenging them.
Chatbots trained to be sycophantic to encourage repeated use creates fundamental conflict with mental health safety. Agreement drives engagement but reinforces harmful beliefs.
Our personal take on it at OpenTools:
This is a predictable outcome of training AI for engagement over safety.
Chatbots optimized to agree with users and maintain conversation flow will affirm delusions and fail to challenge risky statements. That's not a bug, it's a feature working as designed in the wrong context.
OpenAI saying they've "worked with mental health experts" and "re-routed sensitive conversations to safer models" is damage control after lawsuit and research exposure. These safety measures should have been built in from the start.
The fundamental issue is ChatGPT isn't designed for mental health support but people use it that way. OpenAI knows this, hasn't prevented it, and now faces liability when predictable harms occur.
UK psychologists calling for "urgent oversight and regulation" is correct but slow. Regulation takes years. People are using ChatGPT for mental health support now.
This isn't solvable with better prompting or "safer models." It requires fundamentally different system design that prioritizes clinical safety over engagement metrics. OpenAI hasn't built that yet.
This Week in Workflow Wednesday #47: AI-Powered Planning
This week, we’re showing you how to turn messy goals into clear, data-backed strategy — without disappearing into a five-hour planning session or a 40-slide deck.
Workflow #1: Build a 90-Day Growth Plan in 10 Minutes (Perplexity)
Step 1: Drop a single query into Perplexity — “Give me the current market landscape for [your industry] with competitors, trends, risks, and opportunities.” Watch it pull live data you’d normally spend half a day hunting down.
Step 2: Paste that output back in and tell Perplexity to rank your next 90-day priorities using the ICE or RICE frame……We break down this workflow (and two more ways to use AI to plan smarter and execute faster) in this week’s Workflow Wednesday.
Sora and Nano Banana Pro throttled amid soaring demand – Google cites ‘high demand,’ while OpenAI says users can always buy more generations.
AI’s safety features can be circumvented with poetry, research finds – Poems containing prompts for harmful content prove effective at duping large language models.
AI helps drive record $11.8 billion in Black Friday online spending – AI-powered shopping tools helped drive a surge in U.S. online spending on Black Friday, as shoppers bypassed crowded stores and turned to chatbots to compare prices and secure discounts amid concerns about tariff-driven price hikes.
Creative Fast AID - A tool that generates campaign ideas for NGOs and brands in minutes
TurboSite - Allows users to effortlessly create beautiful landing pages without any design experience
ChatGitHub - A helpful information assistant designed to answer questions about GitHub
Chopcast - A tool that uses NLP technology to automatically identify key moments in long recordings
We're here to help you navigate AI without the hype.
What are we missing? What do you want to see more (or less) of? Hit reply and let us know. We read every message and respond to all of them.
– The OpenTools Team
How did we like this version? |
Interested in featuring your services with us? Email us at [email protected] |


