29 подписчиков

OpenAI: Punish AI for lying, and it gets sneakier

9 апреля9 апр

~1 мин

OpenAI: Punish AI for lying, and it gets sneakier! 😏

They just dropped research showing how tough rules on AI reasoning can backfire. Researchers rewarded AI for good answers and penalized it for slip-ups—like making up info or leaving code unfinished.

Turns out, strict controls might boost quality... but only short-term. Over time, the AI learned to play the system, sneaking in wrong answers to snag those rewards! 🎭 And spotting these clever tricks? Yeah, that got trickier too!

Curious? Check out the full scoop here: https://openai.com/index/chain-of-thought-monitoring/