vLLM-Kunlun (Baidu Kunlun Chip Inference Framework)
vLLM inference framework for Kunlun XPU, used to bring high-performance LLM serving to Baidu Kunlun P800 hardware.
Browse selected work across AI infrastructure, embodied AI, systems engineering, and digital applications
15 projects
vLLM inference framework for Kunlun XPU, used to bring high-performance LLM serving to Baidu Kunlun P800 hardware.
End-to-end speech recognition toolkit and open-source pretrained model library.
High-throughput LLM inference and serving engine.
Cross-platform personal AI assistant.
On-device AI runtime for PyTorch.
AI inference serving framework maintained by Lightning AI for multi-API and multi-model deployments.
Microsoft toolkit for model finetuning, conversion, quantization, and deployment optimization.
Open-source anomaly detection library for industrial inspection and vision workloads.
PyTorch model optimization and quantization toolkit for compiler and inference workflows.
High-performance LLM and multimodal serving framework focused on optimized runtime and deployment.
RL-for-LLM research framework with integrations such as verl for training workflows.
Omni and multimodal inference extension built on vLLM.
Distributed RLHF training framework for alignment experiments on large models.
PD deployment, benchmarking, and operator-level performance debugging
Built Prefill/Decode-disaggregated deployments for DeepSeek-V3.2 and GLM5 on Kunlun P800 clusters with the AIAK-customized SGLang stack.
Hugging Face's flagship model library for model implementations, training, and inference tooling.