Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models
Source: https://arxiv.org/pdf/2412.12687.pdfhttps://arxiv.org/pdf/2412.12687.pdf U HLM lets the on device SLM opportunistically skip uplink transmission and server side LLM verification for low unce…