30 подписчиков

🌟 Meet MiniCPM-V 4.5: the compact powerhouse ready to take on the big dogs in multimodal AI

12 сентября 202512 сен 2025

1 мин

🌟 Meet MiniCPM-V 4.5: the compact powerhouse ready to take on the big dogs in multimodal AI! OpenBMB has just dropped this game-changer, built on Qwen3-8B & SigLIP2-400M. It recognizes images, image series, and videos, and guess what? 🌍 It works on mobile in over 30 languages! OpenBMB is a non-profit arm of ModelBest, backed by Tsinghua University. 🚀 Their parent company has some big-name investors like Habo (Huawei), Primavera Capital Group, and Shenzhen Guozhong Venture Capital Management. 🟡 The killer feature? Video handling like a boss! 💪 With a unified 3D-Resampler, this model compresses video by 96x: six 448x448 frames turn into just 64 tokens. Most MLLMs would need a whopping 1536! This means it can process footage at up to 10 FPS without breaking a sweat 🏃‍♂️—just check out its top scores on Video-MME, LVBench, and MLVU. Thanks to its LLaVA-UHD architecture, it tackles images up to 1.8 MP with any aspect ratio using only a quarter of the visual tokens! 📸 Plus, it’s g

🌟 Meet MiniCPM-V 4.5: the compact powerhouse ready to take on the big dogs in multimodal AI!

OpenBMB has just dropped this game-changer, built on Qwen3-8B & SigLIP2-400M. It recognizes images, image series, and videos, and guess what? 🌍 It works on mobile in over 30 languages!

OpenBMB is a non-profit arm of ModelBest, backed by Tsinghua University. 🚀 Their parent company has some big-name investors like Habo (Huawei), Primavera Capital Group, and Shenzhen Guozhong Venture Capital Management.

🟡 The killer feature? Video handling like a boss! 💪 With a unified 3D-Resampler, this model compresses video by 96x: six 448x448 frames turn into just 64 tokens. Most MLLMs would need a whopping 1536! This means it can process footage at up to 10 FPS without breaking a sweat 🏃‍♂️—just check out its top scores on Video-MME, LVBench, and MLVU.

Thanks to its LLaVA-UHD architecture, it tackles images up to 1.8 MP with any aspect ratio using only a quarter of the visual tokens! 📸 Plus, it’s got flexible modes: quick reasoning for everyday tasks or deep dives for the complex stuff—switched on demand!

With 8 billion parameters under the hood, MiniCPM-V 4.5 scored an impressive 77.0 on OpenCompass's benchmark 🎯—beating out previous versions and even surpassing GPT-4o-latest & Gemini-2.0 Pro. It’s setting a new standard for MLLMs on OmniDocBench! 🏆 #AI #TechNews #Innovation