Don't use a K8s Service for LLM Serving!

Relying solely on standard Kubernetes Services for load balancing can lead to suboptimal performance when sering LLMs. That's because engines like vLLM provide Prefix caching which can speed up the in...