Search papers, labs, and topics across Lattice.
This paper presents a reproducibility study and extended analysis of the Hypencoder, a retrieval framework that uses a query-specific neural network ($q$-net) to score documents. The authors confirm that the Hypencoder outperforms bi-encoder baselines on in-domain and out-of-domain benchmarks, while also exploring the impact of different pre-trained encoders, query latency compared to Faiss, and adversarial robustness. Their results indicate that while Hypencoder offers performance gains, standard bi-encoders remain faster and the $q$-net does not consistently improve adversarial robustness.
Non-linear scoring with Hypencoders boosts retrieval performance, but don't expect it to fix your speed or adversarial robustness problems.
The Hypencoder, proposed by Killingback et al., is a retrieval framework that replaces the fixed inner-product scoring function used in standard bi-encoders with a query-specific neural network (the $q$-net), whose weights are generated by a hypernetwork from the contextualized query embeddings. This design enables more expressive relevance estimation while preserving independent query and document encoding. In this work, we conduct a reproducibility study of the Hypencoder and extend the original analysis in three directions. Our reproduction confirms that the Hypencoder outperforms a similarly trained bi-encoder baseline on in-domain and out-of-domain benchmarks, and that the proposed efficient search algorithm substantially reduces query latency with minimal performance loss. On hard retrieval tasks, we find partial support: the Hypencoder outperforms the baseline on DL-Hard and FollowIR, but not on TREC TOT, where checkpoint incompatibility and fine-tuning sensitivity complicate full verification. Beyond reproduction, we investigate three extensions: (i)~integrating alternative pre-trained encoders into the Hypencoder framework, where we find that performance gains depend on the encoder and fine-tuning strategy; (ii)~comparing query latency against a Faiss-based bi-encoder pipeline, revealing that standard bi-encoder retrieval remains faster under both exhaustive and efficient search settings; and (iii)~evaluating adversarial robustness, where we find that the $q$-net's non-linear scoring does not provide a consistent robustness disadvantage over inner-product scoring. Our code is publicly available at https://github.com/arneeichholtz/Hypencoder-reprod.