Anthropic scientists hacked Claude’s brain — and it noticed. Here’s why that’s huge
When researchers at Anthropic injected the concept of "betrayal" into their Claude AI model's neural networks and asked if it noticed anything unusual, the system paused before responding: "I'm experiencing something that feels like an intrusive thought about 'betrayal'." The exchange, detailed in new research published Wednesday, marks what scientists say is the first rigorous…
