Journal of Chemical Information and Modeling, Journal Year: 2025, Volume and Issue: unknown
Published: April 29, 2025
The rapid adoption of big data, machine learning (ML), and generative artificial intelligence (AI) in chemical discovery has heightened the importance quantifying molecular similarity. Molecular similarity, commonly assessed as distance between fingerprints, is integral to applications such database curation, diversity analysis, property prediction. AI tools frequently rely on these similarity measures cluster molecules under assumption that structurally similar exhibit properties. However, this not universally valid, particularly for continuous properties like electronic structure Despite prevalence fingerprint-based measures, their evaluation largely depended biological activity data sets qualitative metrics, limiting relevance nonbiological domains. To address gap, we propose a framework evaluate correlation Our approach builds concept neighborhood behavior incorporates kernel density estimation (KDE) analysis quantify how well capture relationships. Using set over 350 million molecule pairs with structure, redox, optical properties, systematically several fingerprint generators, functions, Both curated are publicly available.
Language: Английский