Search papers, labs, and topics across Lattice.
The paper introduces the Multi-Objective Coverage (MOC) problem, aiming to identify a small, representative set of samples that broadly cover the feasible multi-objective space, addressing limitations of existing methods that focus on sample space coverage or Pareto front optimization. To solve this, they propose MOC-CAS, a search algorithm using an upper confidence bound acquisition function guided by Gaussian process predictions. Empirical results on large-scale protein-target datasets for SARS-CoV-2 and cancer demonstrate MOC-CAS's superior performance compared to baselines.
Speed up drug discovery by orders of magnitude by intelligently selecting a small, representative set of molecules that cover the multi-objective space.
In this paper, we formulate the new multi-objective coverage (MOC) problem where our goal is to identify a small set of representative samples whose predicted outcomes broadly cover the feasible multi-objective space. This problem is of great importance in many critical real-world applications, e.g., drug discovery and materials design, as this representative set can be evaluated much faster than the whole feasible set, thus significantly accelerating the scientific discovery process. Existing works cannot be directly applied as they either focus on sample space coverage or multi-objective optimization that targets the Pareto front. However, chemically diverse samples often yield identical objective profiles, and safety constraints are usually defined on the objectives. To solve this MOC problem, we propose a novel search algorithm, MOC-CAS, which employs an upper confidence bound-based acquisition function to select optimistic samples guided by Gaussian process posterior predictions. For enabling efficient optimization, we develop a smoothed relaxation of the hard feasibility test and derive an approximate optimizer. Compared to the competitive baselines, we show that our MOC-CAS empirically achieves superior performances across large-scale protein-target datasets for SARS-CoV-2 and cancer, each assessed on five objectives derived from SMILES-based features.