Search papers, labs, and topics across Lattice.
This paper introduces an operator-level factorial benchmark to dissect the impact of different message-passing operators within molecular MPNNs. By systematically evaluating 84 configurations across ten MoleculeNet datasets, the study reveals that performance is primarily driven by message construction, particularly message-seed initialization and node-edge fusion, rather than the complexity of node update functions. Further analysis using a representation probe highlights the advantages of concatenation-based mixing in differentiating chemical features and resisting oversmoothing.
Forget complex updates: the secret to better molecular property prediction in MPNNs lies in how you initialize and fuse messages.
Message-passing neural networks (MPNNs) are widely used for molecular property prediction, but their deployment as monolithic architectures makes it difficult to identify how specific message-passing operators affect performance. We present an operator-level factorial benchmark that decomposes 2D molecular MPNNs into the three families of message-seed initialization, node-edge fusion, and node update operators. The resulting 84 configurations are benchmarked on ten MoleculeNet datasets under a shared experimental setup and statistical analysis protocol. Across this controlled design, performance variation is associated primarily with message construction rather than update complexity. Message-seed initialization shows significant family-level effects for both regression and classification, node-edge fusion shows a significant family-level effect for regression with descriptive advantages for concatenation-based mixing, and the update family shows no statistically supported effect for either endpoint family. A representation probe into the Quinethazone molecule further demonstrates that concatenation-based mixing can better differentiate chemically distinct heteroatoms and withstand oversmoothing than Hadamard gating. Representative configurations selected separately for classification and regression recover competitive performance relative to established molecular graph neural network (GNN) baselines, ranking numerically best on eight of ten benchmark datasets. These empirical results are interpreted through concise mechanistic analyses of representative node-edge fusion and update operators. Our findings provide empirical design heuristics for molecular MPNNs by turning model design from a search over monolithic architectures into a targeted assessment of where and how chemical information enters the message-passing pipeline.