- LLM Inference Performance Engineering: Best Practices:https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices
- 大语言模型推理性能工程优化最佳实践:https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486293&idx=1&sn=2b47cbbe189953599e254158fd78a18d&chksm=fd3be206ca4c6b109667e8813623db42a53b7ac0cd628a6cfd57f36334cb53e9ee33d49dd2dc&scene=21#wechat_redirect
- 语言大模型推理性能工程:最佳实践:https://zhuanlan.zhihu.com/p/663282469
- Reproducible Performance Metrics for LLM inference:https://www.anyscale.com/blog/reproducible-performance-metrics-for-llm-inference
- 可复现的语言大模型推理性能指标:https://zhuanlan.zhihu.com/p/667612787
- 从零实现AI推理引擎