vLLM-Kunlun (Baidu Kunlun Chip LLM Inference Framework)
LLM inference framework for Baidu Kunlun XPU that adapts capabilities such as PagedAttention to Kunlun P800, enabling high-performance inference for mainstream models such as Qwen.
Browse selected work by AI infrastructure, embodied AI, systems engineering, and web development
7 projects
LLM inference framework for Baidu Kunlun XPU that adapts capabilities such as PagedAttention to Kunlun P800, enabling high-performance inference for mainstream models such as Qwen.
High-performance LLM serving and inference framework focused on optimized runtime and multi-backend deployment.
Research framework for reinforcement learning and LLM training workflows with integrations such as verl.
Omni and multimodal inference extension built on vLLM for speech and text generation scenarios.
Distributed training framework for RLHF and alignment experiments.
Flagship Hugging Face library for model implementations plus training and inference utilities.
RAG System Construction and LLM Evaluation (Algorithm)
Intelligent administrative knowledge base for the Haikou Police system, using RAG to improve answer quality for administrative knowledge queries.