Article ; Online: MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.
Bioinformatics (Oxford, England)
2023 Volume 39, Issue 11
Abstract: Motivation: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant ...
Abstract | Motivation: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. Results: To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models, such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks. Availability and implementation: The MedCPT code and model are available at https://github.com/ncbi/MedCPT. |
---|---|
MeSH term(s) | Information Storage and Retrieval ; Language ; Natural Language Processing ; PubMed ; Semantics ; Review Literature as Topic |
Language | English |
Publishing date | 2023-11-06 |
Publishing country | England |
Document type | Journal Article ; Research Support, N.I.H., Intramural |
ZDB-ID | 1422668-6 |
ISSN | 1367-4811 ; 1367-4803 |
ISSN (online) | 1367-4811 |
ISSN | 1367-4803 |
DOI | 10.1093/bioinformatics/btad651 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
More links
Kategorien
In stock of ZB MED Cologne/Königswinter
Zs.A 2374: Show issues | Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (2.OG) ab Jg. 2022: Lesesaal (EG) |
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.