For the article "CHEMOGRAPHY: SEARCHING FOR HIDDEN TREASURES"
Yuliana Zabolotna, Arkadii Lin, Dragos Horvath, Gilles Marcou, Dmitriy M. Volochnyuk, Alexandre Varnek
Corresponding Author
Prof. A. Varnek, E-mail:
varnek@unistra.fr.
Abstract
The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential “blockbuster drugs” are well hidden, and yet only a few mouse clicks away. To reach these «hidden treasures», we adapted Generative Topographic Mapping to enable efficient navigation through the chemical space, from a global overview to structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure-activity bases are novel paths to explore in medicinal chemistry.
We are pleased to share with you a complete list of the abovementioned ZINC- and ChEMBL-specific chemotypes. The link will be given after registration below.
Please note, that ChEMBL and ZINC datasets were split into four subsets - Fragment-Like, Lead-Like, Drug-Like and PPI-Like, and then analyzed pairwise. Thus, Fragment-Like ChEMBL-specific chemotypes are absent from Fragment-Like ZINC but can be present in the unfiltered version of ZINC.