Statistical Model for the Ai Alignment Problem

Hosted on MSN

The Human-AI Alignment Problem

We’re now deep into the AI era, where every week brings another feature or task that AI can accomplish. But given how far down the road we already are, it’s all the more essential to zoom out and ask ...

2don MSNOpinion

The Problem With AI Flattering Us

The most dangerous part of AI might not be the fact that it hallucinates—making up its own version of the truth—but that it ...

HUB

Gillian K. Hadfield named Bloomberg Distinguished Professor of AI Alignment and Governance

In a world where machines and humans are increasingly intertwined, Gillian Hadfield is focused on ensuring that artificial intelligence follows the norms that make human societies thrive. "The ...

Futurism

OpenAI Tries to Train AI Not to Deceive Users, Realizes It’s Instead Teaching It How to Deceive Them While Covering Its Tracks

OpenAI researchers tried to train the company’s AI to stop “scheming” — a term the company defines as meaning “when an AI behaves one way on the surface while hiding its true goals” — but their ...

ZDNet

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

ZDNet

AI models know when they're being tested - and change their behavior, research shows

Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results