30 подписчиков

🔍 Anthropic is diving deep into AI interpretability and working on the "MRI for AI

13 мая 202513 мая 2025

1 мин

🔍 Anthropic is diving deep into AI interpretability and working on the "MRI for AI"! 🧠💡 Co-founder Dario Amodei points out a major issue: we're cranking out super-powerful AI systems without fully understanding how they tick. 🤔 This lack of transparency makes it tricky to assess if these AIs could be conscious (and maybe deserve some rights?). There's a race happening between boosting AI power and companies figuring out how these systems operate. Dario predicts we might see AGI-level models popping up by 2026-2027! 🚀 Anthropic is developing "mechanistic interpretability" techniques to peer inside AI and uncover its inner workings. Think of it as an MRI scan for AI! 🩻 This will help spot issues like lying tendencies, power-seeking behavior, vulnerabilities, and cognitive strengths/weaknesses. He’s calling on neurobiologists to join the fun—gathering data from artificial neural networks is way easier than from biological ones! 💪 So far, they’ve managed to: 1. Identify over 30

🔍 Anthropic is diving deep into AI interpretability and working on the "MRI for AI"! 🧠💡

Co-founder Dario Amodei points out a major issue: we're cranking out super-powerful AI systems without fully understanding how they tick. 🤔 This lack of transparency makes it tricky to assess if these AIs could be conscious (and maybe deserve some rights?).

There's a race happening between boosting AI power and companies figuring out how these systems operate. Dario predicts we might see AGI-level models popping up by 2026-2027! 🚀

Anthropic is developing "mechanistic interpretability" techniques to peer inside AI and uncover its inner workings. Think of it as an MRI scan for AI! 🩻 This will help spot issues like lying tendencies, power-seeking behavior, vulnerabilities, and cognitive strengths/weaknesses.

He’s calling on neurobiologists to join the fun—gathering data from artificial neural networks is way easier than from biological ones! 💪

So far, they’ve managed to:

1. Identify over 30 million concepts in a mid-sized model.

2. Develop ways to track and manipulate feature groups called "schemes" or "chains".

3. Learn how to “trace” the model’s thought process in problem-solving.

But wait, there’s more! Anthropic isn’t just about research; they’re applying this in business too—especially in fields like finance, healthcare, and law where transparent decision-making is crucial! 💼⚖️

Dario believes that interpretability can unlock better insights into patterns predicted by AI in areas like DNA analysis and protein sequences. 🌟✨ Stay tuned for more breakthroughs!