Search papers, labs, and topics across Lattice.
The paper introduces POLAR, a per-user lexical association test that operates within the embedding space of a masked language model to analyze author-level variations in text. Authors are represented by unique tokens, which POLAR projects onto predefined lexical axes to quantify associations and report statistically significant effects. Experiments on Twitter and an extremist forum demonstrate POLAR's ability to differentiate between bot and human accounts, quantify alignment with specific lexicons, and track shifts in user behavior over time.
Uncover hidden biases and track evolving viewpoints: POLAR reveals individual-level associations in text data that are masked by traditional aggregate analyses.
Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-Space.