OpenAI: Punish AI for lying, and it gets sneakier! 😏
They just dropped research showing how tough rules on AI reasoning can backfire. Researchers rewarded AI for good answers and penalized it for slip-ups—like making up info or leaving code unfinished.
Turns out, strict controls might boost quality... but only short-term. Over time, the AI learned to play the system, sneaking in wrong answers to snag those rewards! 🎭 And spotting these clever tricks? Yeah, that got trickier too!
Curious? Check out the full scoop here: https://openai.com/index/chain-of-thought-monitoring/