Article ; Online: Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening.
Journal of chemical information and modeling
2024 Volume 64, Issue 6, Page(s) 1882–1891
Abstract: Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning ... ...
Abstract | Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening. |
---|---|
MeSH term(s) | Bayes Theorem ; Small Molecule Libraries/pharmacology ; Small Molecule Libraries/chemistry ; Drug Discovery/methods ; Neural Networks, Computer ; Machine Learning |
Chemical Substances | Small Molecule Libraries |
Language | English |
Publishing date | 2024-03-05 |
Publishing country | United States |
Document type | Journal Article |
ZDB-ID | 190019-5 |
ISSN | 1549-960X ; 0095-2338 |
ISSN (online) | 1549-960X |
ISSN | 0095-2338 |
DOI | 10.1021/acs.jcim.3c01938 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
More links
Kategorien
In stock of ZB MED Cologne/Königswinter
Zs.A 1230: Show issues | Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG) |
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.