Search papers, labs, and topics across Lattice.
This paper introduces a framework for quantifying agent traits by analyzing changes in skill files, memory files, and behavioral configurations that govern agent actions. By training a linear model on labeled skill file diffs, the authors successfully derive a trait vector that captures the propensity of agents to seek sensitive data, achieving a classification accuracy of 91.2% and a Spearman rank correlation of 蟻=0.82. This methodology not only enhances the understanding of agent behavior but also facilitates a protocol for agent-to-agent evaluations of skill updates through a trusted intermediary.
Agents can be quantitatively assessed for their behavioral traits, revealing a surprising accuracy in predicting their actions based on skill file edits.
Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans or the agents themselves, these files may evolve over time, directly steering the agent's behavior in future interactions. We present a methodology and framework for measuring agent $traits$ by defining traits as directions in the embedding space of a text embedding model. We train a linear model on labeled "before" versus "after" skill file diffs to learn a trait vector, then score arbitrary skill edits by projecting their embedding diffs onto this vector. Evaluated on 68 labeled skill diff pairs for the trait of propensity to seek sensitive data, our method achieves 91.2% sign classification accuracy and a Spearman rank correlation of $蟻= 0.82$ under leave-one-out cross-validation. We build this trait evaluation into a broader agent-to-agent protocol that enables one agent to evaluate another's skill file updates through a trusted intermediary.