30 подписчиков

⚡️ Big news from vLLM: they just dropped a new Sleep Mode for lightning-fast model switching

22 ноября 202522 ноя 2025

~1 мин

⚡️ Big news from vLLM: they just dropped a new Sleep Mode for lightning-fast model switching! 🚀 In their latest blog post, they explain how this game-changing feature allows you to switch between language models in seconds—no more waiting 30-100 seconds for reloads. 😴✨ Just "put them to sleep" and "wake them up" without losing your progress! There are two snooze options: 1️⃣ Level 1: Weights stored in RAM for quick wake-ups (but it eats up that precious memory). 2️⃣ Level 2: Full unload for minimal RAM use, though it takes a bit longer to get going. Both methods supercharge performance: model switches are now 18-200x faster, and inference times after waking are up by 61-88%! 🏎💨 Perfect for juggling multiple models—even on mid-tier GPUs like the A4000 and A100. Check out the blog for all the deets: https://blog.vllm.ai/2025/10/26/sleep-mode.html 🌐

⚡️ Big news from vLLM: they just dropped a new Sleep Mode for lightning-fast model switching! 🚀

In their latest blog post, they explain how this game-changing feature allows you to switch between language models in seconds—no more waiting 30-100 seconds for reloads. 😴✨ Just "put them to sleep" and "wake them up" without losing your progress!

There are two snooze options:

1️⃣ Level 1: Weights stored in RAM for quick wake-ups (but it eats up that precious memory).

2️⃣ Level 2: Full unload for minimal RAM use, though it takes a bit longer to get going.

Both methods supercharge performance: model switches are now 18-200x faster, and inference times after waking are up by 61-88%! 🏎💨 Perfect for juggling multiple models—even on mid-tier GPUs like the A4000 and A100.

Check out the blog for all the deets: https://blog.vllm.ai/2025/10/26/sleep-mode.html 🌐