Search papers, labs, and topics across Lattice.
The paper introduces TopKGraphs, a novel method for estimating node similarity in networks using Jaccard-biased random walks to sample structurally similar neighborhoods. This approach generates partial node rankings from each walk, which are then aggregated using robust rank aggregation to construct node affinity matrices. Experiments on synthetic and real-world graphs demonstrate that TopKGraphs achieves competitive or superior performance compared to standard similarity measures, diffusion-based methods, and embedding-based approaches, particularly in sparse, noisy, or heterogeneous networks.
Forget personalized PageRank and Node2Vec: Jaccard-biased random walks plus rank aggregation yield surprisingly robust node affinities, outperforming alternatives on diverse graph types.
Estimating node similarity is a fundamental task in network analysis and graph-based machine learning, with applications in clustering, community detection, classification, and recommendation. We propose TopKGraphs, a method based on start-node-anchored random walks that bias transitions toward nodes with structurally similar neighborhoods, measured via Jaccard similarity. Rather than computing stationary distributions, walks are treated as stochastic neighborhood samplers, producing partial node rankings that are aggregated using robust rank aggregation to construct interpretable node-to-node affinity matrices. TopKGraphs provides a non-parametric, interpretable, and general-purpose representation of node similarity that can be applied in both network analysis and machine learning workflows. We evaluate the method on synthetic graphs (stochastic block models, Lancichinetti-Fortunato-Radicchi benchmark graphs), k-nearest-neighbor graphs from tabular datasets, and a curated high-confidence protein-protein interaction network. Across all scenarios, TopKGraphs achieves competitive or superior performance compared to standard similarity measures (Jaccard, Dice), a diffusion-based method (personalized PageRank), and an embedding-based approach (Node2Vec), demonstrating robustness in sparse, noisy, or heterogeneous networks. These results suggest that TopKGraphs is a versatile and interpretable tool for bridging simple local similarity measures with more complex embedding-based approaches, facilitating both data mining and network analysis applications.