![Cover Image for [Hands-on Workshop] vLLM Reality Check: Understand, Deploy, Optimize, Automate](https://images.lumacdn.com/cdn-cgi/image/format=auto,fit=cover,dpr=2,background=white,quality=75,width=400,height=400/event-covers/30/c926a841-d49f-44dd-87b1-1ebaab735942.jpg)
![Cover Image for [Hands-on Workshop] vLLM Reality Check: Understand, Deploy, Optimize, Automate](https://images.lumacdn.com/cdn-cgi/image/format=auto,fit=cover,dpr=2,background=white,quality=75,width=400,height=400/event-covers/30/c926a841-d49f-44dd-87b1-1ebaab735942.jpg)
[Hands-on Workshop] vLLM Reality Check: Understand, Deploy, Optimize, Automate
What the Workshop Will be About:
This workshop will provide a deep dive into vLLM, a high-performance inference engine for large language models.
Participants will explore how vLLM works under the hood, how it optimizes model execution, and how to effectively deploy and manage it in production environments.
Key Points to Cover:
vLLM Internals - Understand the request lifecycle. How vLLM manages model families, and how GPU execution is orchestrated for optimal performance.
Transformer Architecture & vLLM Optimizations - Revisit transformer architecture foundations and learn how vLLM leverages techniques like continuous batching, PagedAttention to accelerate inference.
Advanced Deployment Strategies - Explore best practices for deploying vLLM across different environments, starting from single-GPU setups.
Automated management of inference engines - Discuss what are shortcomings of self serving, why inference providers exist, and test a deployment flow in Cast.ai
Expected Outcomes:
Unpack the vLLM deployment struggle
Benchmark performance gaps between manual and optimized configurations
Discover GPU optimization challenges - availability, selection, and scaling issues
See automated deployment solve it - same benchmark, better results
You'll be the one running the commands, hitting the walls, and discovering why leading teams automate their AI infrastructure with AI Enabler instead of building it from scratch.
Who it’s for:
AI/ML Engineers, MLOps, LLMOps, DevOps Engineers, and Platform Engineers running or planning to run models in production.
Pre-Workshop Requirements:
A laptop & basic knowledge of Kubernetes and LLMs
Meet Your Conductor:
Igor Šušić, a talented Staff Machine Learning Engineer will be driving the masterclass.
Food, drinks and good vibes will be provided during the workshop.