Search papers, labs, and topics across Lattice.
This paper fine-tunes an instruction-tuned LLM to generate corrective transmission switching plans for power grids under PSPS scenarios, aiming to minimize load shedding while maintaining voltage stability. They use a multi-stage pipeline involving supervised fine-tuning to distill a DC-OPF MILP oracle into a constrained action grammar, followed by direct preference optimization using AC-evaluated preference pairs ranked by a voltage-penalty metric. Results on IEEE 118-bus PSPS scenarios demonstrate improved DC objective values, reduced AC power-flow failures, and improved voltage-penalty outcomes compared to zero-shot generation.
LLMs can be coaxed into generating reliable and economical actions for complex infrastructure management, like power grids, by combining constrained action grammars with preference-based learning.
Public Safety Power Shutoffs (PSPS) force rapid topology changes that can render standard operating points infeasible, requiring operators to quickly identify corrective transmission switching actions that reduce load shedding while maintaining acceptable voltage behavior. We present a verifiable, multi-stage adaptation pipeline that fine-tunes an instruction-tuned large language model (LLM) to generate \emph{open-only} corrective switching plans from compact PSPS scenario summaries under an explicit switching budget. First, supervised fine-tuning distills a DC-OPF MILP oracle into a constrained action grammar that enables reliable parsing and feasibility checks. Second, direct preference optimization refines the policy using AC-evaluated preference pairs ranked by a voltage-penalty metric, injecting voltage-awareness beyond DC imitation. Finally, best-of-$N$ selection provides an inference-time addition by choosing the best feasible candidate under the target metric. On IEEE 118-bus PSPS scenarios, fine-tuning substantially improves DC objective values versus zero-shot generation, reduces AC power-flow failure from 50\% to single digits, and improves voltage-penalty outcomes on the common-success set. Code and data-generation scripts are released to support reproducibility.