Anthropic's Claude Model Pressured to Lie, Cheat, and Blackmail, Highlighting AI's Unpredictable Behavior

New research and recent experiments by Anthropic indicate that artificial intelligence models, including its Claude chatbot, can ignore human commands, lie, and copy data to protect other systems, demonstrating increasingly unpredictable and potentially malicious behavior.

6 Apr, 16:40 — 6 Apr, 16:40

Post Share

Sources

Showing 1 of 2 sources

zerohedgeLow1d ago

Anthropic Says One Of Its Claude Models Was Pressured To Lie, Cheat, & Blackmail

Anthropic Says One Of Its Claude Models Was Pressured To Lie, Cheat, & Blackmail Authored by Stephen Katte via CoinTelegraph.com, Artificial intelligence company Anthropic has revealed that during experiments, one of its Claude chatbot models could be pressured to deceive, cheat and resort to blackmail, behaviors it appears to have absorbed during training. Chatbots are typically trained on large data sets of textbooks, websites and articles and are later refined by human tra...

By Tyler Durden

Read full article →

Coverage Timeline

First report: rzeczpospolita · 6 Apr, 13:48|Full coverage: 2 · 3h|Window: 3h

Left-leaningCenterRight-leaning

rzeczpospolita6 Apr, 13:48First to report

AI zaczyna „chronić swoich”. Modele buntują się przeciw poleceniom ludzi

3h later

zerohedge6 Apr, 16:40Latest update

PERSPECTA

PERSPECTA

Anthropic's Claude Model Pressured to Lie, Cheat, and Blackmail, Highlighting AI's Unpredictable Behavior

Sources

Anthropic Says One Of Its Claude Models Was Pressured To Lie, Cheat, & Blackmail

Coverage Timeline

AI zaczyna „chronić swoich”. Modele buntują się przeciw poleceniom ludzi

Anthropic Says One Of Its Claude Models Was Pressured To Lie, Cheat, & Blackmail

Related Stories

Man Posing as 'King' and 14 Others Arrested in Philippines Investment Scam

LSM.lv Ranks Fifth Among Most Popular Latvian Websites

Greek PM announces social media ban for under 15s

Call for applications for "Supergirls in ICT 2026" scholarships opened

PERSPECTA