We are excited to invite you to join the online meetup co-hosted by SGLang, FlashInfer, and MLC LLM! The three closely collaborating projects will share their different perspectives on efficient LLM deployment and serving. We also intend for this to be an opportunity for people from the community to interact with one another.
Tentative Agenda:
4:00 - 4:45 pm PST: SGLang overview, updates, Q&A
Speakers: Liangsheng Yin, Lianmin Zheng, Ke Bao
Featured topics: Low CPU overhead scheduling in SGLang,
Deepseek MLA optimizations, Fast JSON decoding
4:50 - 5:35 pm PST: FlashInfer overview, updates, Q&A
Speakers: Zihao Ye
Featured topics: Kernel generation for high performance LLM serving
5:40 - 6:25 pm PST: MLC LLM overview, updates, Q&A
Speakers: Ruihang Lai, Yixin Dong, Tianqi Chen
Featured topics: Universal LLM deployment, Low-latency serving,
Fast grammar-based decoding