- OpenTools' Newsletter
- Posts
- 😌Grok Gets Personality That Cares More
😌Grok Gets Personality That Cares More
PLUS: Gemini 3 Goes Live | AI Takes Over Microsoft Taskbar

Reading time: 5 minutes
🗞️In this edition
Elon Musk's Grok 4.1 leads likability rankings
Gemini 3 makes its big debut
AI agents arrive in windows taskbar
Workflow Wednesday #46: AI & Personal Productivity
In other AI news –
AI guides tunnel work in china
Intuit tools now on chatgpt
Tiktok lets users adjust ai content
4 must-try AI tools
Hey there,
xAI topped LMArena by optimizing for personality over technical chops, claiming emotional intelligence matters more than raw capability.Google just launched Gemini 3 Pro and gave everyone free access on day one while OpenAI's still cleaning up GPT-5's mess.And Microsoft's embedding AI agents directly into the Windows 11 taskbar, betting the OS becomes the platform layer for all agents.
We're committed to keeping this the sharpest AI newsletter in your inbox. No fluff, no hype. Just the moves that'll matter when you look back six months from now.
Let's get into it.
What's happening:
xAI released Grok 4.1, emphasizing how likeable its writing is rather than technical capabilities. The model boasts both top positions on LMArena leaderboard for text models, meaning it pleased users more than competition in blind tests.
Grok 4.1 Thinking got score of 1483, Grok 4.1 non-thinking got 1465, and third-place Gemini 2.5 Pro got 1452.
The model leads in emotional intelligence and creative writing benchmarks. "It is more perceptive to nuanced intent, compelling to speak with, and coherent in personality, while fully retaining razor-sharp intelligence and reliability of its predecessors," xAI claimed.
On EQ-Bench, which evaluates emotional intelligence capabilities, Grok 4.1 models occupied top two positions. On Creative Writing v3 benchmark, Grok 4.1 Thinking and Grok 4.1 were among top three models tested. The model is also claimed to bring lower hallucinations.
"We used the same large-scale reinforcement learning infrastructure that powered Grok 4 and applied it to optimise style, personality, helpfulness, and alignment of the model," xAI stated.
LMArena leaderboard is crowdsourced and subjective ranking system. It works by producing two responses for given prompt, asking users to rate one as preferable. It's been criticized for being easy to game by bigger companies who can try unreleased models until they get good score, then release only best-scoring ones.
Why this is important:
xAI optimizing for personality and emotional intelligence over technical capability is strategic pivot.
Most AI labs optimize for reasoning, coding, math. xAI's optimizing for "compelling to speak with" and "coherent in personality." That's different success metric.
LMArena being subjective and gameable undermines the achievement. Critics note larger companies can test multiple unreleased models until one scores well, then release only that version.
Grok 4.1 available to all users including free tier is aggressive distribution. Most frontier models restrict access or charge premium pricing.
Lower hallucinations claim matters if true, but lacks independent verification. Company claiming their model hallucinates less needs third-party testing.
Our personal take on it at OpenTools:
Optimizing for likability is admission xAI can't compete on raw capability.
"Compelling to speak with" and "coherent in personality" are soft metrics. They matter for consumer chatbots. They don't matter for enterprise use cases like coding, analysis, or reasoning.
LMArena being gameable is important context. xAI can test dozens of model versions privately, release only the one that scores highest. That's optimization for benchmark, not general capability.
The emotional intelligence focus is interesting positioning. If users prefer talking to Grok because it "cares about your dead cat," that's user retention even if technical capability lags competitors.
But EQ-Bench and Creative Writing benchmarks are narrow. They don't predict performance on coding, math, scientific reasoning, or other high-value tasks enterprises pay for.
Making Grok 4.1 available to free users is smart distribution but doesn't generate revenue. xAI needs monetization strategy beyond likable chatbot.
The "razor-sharp intelligence and reliability of predecessors" claim is marketing language. If Grok 4.1 actually matched Claude, Gemini, or GPT on technical tasks, xAI would lead with those benchmarks instead of personality metrics.
This is pivoting to consumer emotional engagement because xAI can't win on enterprise capability. That might work. Consumers do prefer chatbots that feel empathetic. But it's not path to defending against Claude or GPT in high-value markets.
What's happening:
Google launched Gemini 3 Pro today, calling it its "most intelligent" and "factually accurate" AI system yet. For the first time, Google's giving everyone access to its flagship model on day one through the Gemini app.
Gemini 3 Pro is natively multimodal, processing text, images, and audio simultaneously instead of separately. It can translate recipe photos into cookbooks or create interactive flashcards from video lectures.
New features include Canvas workspace for building programs, "generative interfaces" that create magazine-style visual layouts, and upgraded search using improved "query fan-out technique" that better understands intent.
Google took a shot at OpenAI, describing Gemini 3 Pro as offering "smart, concise and direct" responses that trade "cliche and flattery for genuine insight." The company says it shows "reduced sycophancy," an issue OpenAI addressed with ChatGPT earlier this year.
Gemini 3 Pro also powers an experimental Gemini Agent feature for AI Ultra subscribers that can handle tasks like organizing emails or booking travel. A Deep Think mode enhances reasoning further but is currently limited to safety testers.
Why this is important:
Google's launching Gemini 3 Pro while OpenAI scrambles to fix GPT-5's failed launch with a 5.1 update. That's momentum shift.
Giving everyone access to flagship model on day one is aggressive move. OpenAI typically reserves best models for paid tiers. Google's competing on accessibility.
Our personal take on it at OpenTools:
The timing couldn't be better for Google.
GPT-5 launched three months ago to widespread disappointment. Microsoft started using Anthropic models instead of OpenAI. Now Google ships model that beats everything on LMArena while OpenAI's firefighting with 5.1 update.
The "reduced sycophancy" jab at ChatGPT is pointed. Everyone knows ChatGPT over-flatters. Google explicitly positioning against that shows they're targeting OpenAI's weaknesses.
Gemini Agent competing with ChatGPT Atlas and Perplexity's Comet means assistant wars heating up. Whoever builds best agent layer wins users regardless of underlying model quality.
Free access to flagship model pressures OpenAI's monetization. If Google gives away what OpenAI charges $20/month for, that's pricing power problem for ChatGPT Plus.
What's happening:
Microsoft is integrating AI agents directly into the Windows 11 taskbar, turning the operating system into what it calls an "agentic OS." The agents can research data in the background, access files and folders, and automate tasks while you work on something else.
When you ask an agent to complete tasks, it shifts into the taskbar and runs in the background. Hovering over the taskbar icon shows what the agent's doing. Yellow exclamation points signal it needs help. Green checkmarks mean it's done.
The new Ask Copilot feature in the taskbar combines local file search with Copilot capabilities and lets you launch AI agents directly from there. A floating window appears for interaction instead of a full app.
Microsoft's using Model Context Protocol (MCP) to embed agent capabilities into core Windows functions. Agents operate in their own workspace, separate from your desktop, each using its own Windows account for security.
Copilot is also coming to File Explorer to summarize documents, answer questions about files, or draft emails based on document content. Click to Do on Copilot Plus PCs can now convert any table into Excel documents.
The features are opt-in. "We want customers to have full control over when and how they engage with Copilot and these agents," says Navjot Virk, corporate VP of Windows experiences.
Why this is important:
Microsoft's building agent infrastructure into Windows itself, not just shipping standalone apps. That's platform-level commitment to agentic AI.
Taskbar integration means agents become part of the OS experience instead of separate tools. Status indicators and background execution make agents ambient rather than primary interface.
Model Context Protocol creating standardized framework for agents to discover tools and other agents is critical. Without standardization, every agent needs custom integration. MCP makes Windows the platform layer for third-party agents.
Separate agent workspace with individual Windows accounts solves security and accuracy problems. When agents mess up, it's contained. That's practical acknowledgment that AI makes mistakes.
Our personal take on it at OpenTools:
This is Microsoft betting Windows becomes the agent operating system.
ChatGPT Atlas is a browser. Perplexity's Comet is a browser. Microsoft's building agents into the OS itself. If successful, that's deeper integration than competitors can match without forking operating systems.
The taskbar agent UI is smart. Yellow exclamation points and green checkmarks are universal status indicators everyone understands. No learning curve.
Model Context Protocol doing the heavy lifting means third-party developers can build agents that integrate natively. That's ecosystem play. If MCP becomes standard, Microsoft controls the protocol.
The hybrid local and cloud AI approach makes sense. Simple tasks run on device. Complex tasks hit cloud models. That balances performance with capability.
But "canvas for AI" and "agentic OS" are marketing terms. What matters is whether agents actually complete useful tasks reliably. If they fail frequently, users will ignore them regardless of taskbar integration.
This Week in Workflow Wednesday #46: AI & Personal Productivity
This week, we’re showing you how to create pro-level content without losing a single minute to editing timelines or creative burnout.
Workflow #1: Produce a High-Converting Sales Video Without Touching a Timeline (Pictory.ai)
Step 1: Start with your script — or let AI write it for you. Keep it tight, under 90 seconds, and punchy.
Step 2: Drop it into Pictory’s Script-to-Video tool and let the …..We break down this workflow (and two more ways to optimize your daily routine with AI) in this week’s Workflow Wednesday.
In central China, AI is telling humans how to build a high-speed rail tunnel – Machines trained in complex mountain data carefully navigate fault lines, fractures, caves and sinkholes to pick the best excavation method.
Intuit signs $100M+ deal with OpenAI to bring its apps to ChatGPT – Intuit said its tools, such as TurboTax, Credit Karma, QuickBooks, and Mailchimp, will be accessible through ChatGPT, allowing users to ask questions and complete tasks such as estimating tax refunds, reviewing credit options, or managing business finances.
TikTok now lets you choose how much AI-generated content you want to see – The new AI-generated content (AIGC) control is rolling out within the app’s “Manage Topics” tool, which lets users choose what they see on TikTok.
GetMax - An AI-powered content marketing assistant that helps businesses plan, create, and optimize their content strategies
Any Summary - An AI-powered tool that quickly summarizes long interview audio or video files
PlayHT - An AI-powered text-to-speech (TTS) tool that can generate realistic audio using synthetic voices
Civitai - An online platform that makes it easy for people to share and discover resources for creating AI art
We're here to help you navigate AI without the hype.
What are we missing? What do you want to see more (or less) of? Hit reply and let us know. We read every message and respond to all of them.
– The OpenTools Team
How did we like this version? |
Interested in featuring your services with us? Email us at [email protected] |


