Search papers, labs, and topics across Lattice.
The paper introduces IMMACULATE, a framework for auditing black-box LLM API services to detect economically motivated deviations like model substitution or token overbilling. It leverages verifiable computation to selectively audit a small fraction of requests, providing detection guarantees without requiring trusted hardware or access to model internals. Experiments on dense and MoE models demonstrate that IMMACULATE can reliably distinguish between benign and malicious executions with minimal throughput overhead (under 1%).
Now you can audit black-box LLM APIs for cheating (model substitution, overbilling) with <1% overhead, using verifiable computation.
Commercial large language models are typically deployed as black-box API services, requiring users to trust providers to execute inference correctly and report token usage honestly. We present IMMACULATE, a practical auditing framework that detects economically motivated deviations-such as model substitution, quantization abuse, and token overbilling-without trusted hardware or access to model internals. IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead. Experiments on dense and MoE models show that IMMACULATE reliably distinguishes benign and malicious executions with under 1% throughput overhead. Our code is published at https://github.com/guo-yanpei/Immaculate.