LIVIVO - Search results -

Search results

Result 1 - 10 of total 39

Search options

Article ; Online: MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.

Jin, Qiao / Kim, Won / Chen, Qingyu / Comeau, Donald C / Yeganova, Lana / Wilbur, W John / Lu, Zhiyong

2023 Volume 39, Issue 11

Abstract: Motivation: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant ...

Abstract	Motivation: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. Results: To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models, such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks. Availability and implementation: The MedCPT code and model are available at https://github.com/ncbi/MedCPT.
MeSH term(s)	Information Storage and Retrieval ; Language ; Natural Language Processing ; PubMed ; Semantics ; Review Literature as Topic
Language	English
Publishing date	2023-11-06
Publishing country	England
Document type	Journal Article ; Research Support, N.I.H., Intramural
ZDB-ID	1422668-6
ISSN	1367-4811 ; 1367-4803
ISSN (online)	1367-4811
ISSN	1367-4803
DOI	10.1093/bioinformatics/btad651
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 2374: Show issues

Location:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular
Jg. 1995 - 2021: Lesesall (2.OG)
ab Jg. 2022: Lesesaal (EG)

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Details ▾
- See ZB MED holdings
- Order with fees

Article ; Online: Towards a unified search: Improving PubMed retrieval with full text.

Kim, Won / Yeganova, Lana / Comeau, Donald C / Wilbur, W John / Lu, Zhiyong

Journal of biomedical informatics

2022 Volume 134, Page(s) 104211

Abstract: Objective: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the ... ...

Abstract	Objective: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance. Materials and methods: For scoring full text articles, we propose a method to combine information coming from different sections by converting the traditionally used BM25 scores into log odds ratio scores which can be treated uniformly. We further propose a method that successfully combines scores from two heterogenous retrieval sources - full text articles and abstract only articles - by balancing the contributions of their respective scores through a probabilistic transformation. We use PubMed click data that consists of queries sampled from PubMed user logs along with a subset of retrieved and clicked documents to train the probabilistic functions and to evaluate retrieval effectiveness. Results and conclusions: Random ranking achieves 0.579 MAP score on our PubMed click data. BM25 ranking on PubMed abstracts improves the MAP by 10.6%. For full text documents, experiments confirm that BM25 section scores are of different value depending on the section type and are not directly comparable. Naïvely using the body text of articles along with abstract text degrades the overall quality of the search. The proposed log odds ratio scores normalize and combine the contributions of occurrences of query tokens in different sections. By including full text where available, we gain another 0.67%, or 7% relative improvement over abstract alone. We find an advantage in the more accurate estimate of the value of BM25 scores depending on the section from which they were produced. Taking the sum of top three section scores performs the best.
MeSH term(s)	Data Management ; Information Storage and Retrieval ; PubMed
Language	English
Publishing date	2022-09-21
Publishing country	United States
Document type	Journal Article ; Research Support, N.I.H., Intramural
ZDB-ID	2057141-0
ISSN	1532-0480 ; 1532-0464
ISSN (online)	1532-0480
ISSN	1532-0464
DOI	10.1016/j.jbi.2022.104211
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: Opportunities and challenges for ChatGPT and large language models in biomedicine and health.

Tian, Shubo / Jin, Qiao / Yeganova, Lana / Lai, Po-Ting / Zhu, Qingqing / Chen, Xiuying / Yang, Yifan / Chen, Qingyu / Kim, Won / Comeau, Donald C / Islamaj, Rezarta / Kapoor, Aadit / Gao, Xin / Lu, Zhiyong

Briefings in bioinformatics

2024 Volume 25, Issue 1

Abstract: ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this ... ...

Abstract	ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically, we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction and medical education and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.
MeSH term(s)	Humans ; Information Storage and Retrieval ; Language ; Privacy ; Research Personnel
Language	English
Publishing date	2024-04-17
Publishing country	England
Document type	Journal Article
ZDB-ID	2068142-2
ISSN	1477-4054 ; 1467-5463
ISSN (online)	1477-4054
ISSN	1467-5463
DOI	10.1093/bib/bbad493
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 6262: Show issues

Location:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 2021: Bestellungen von Artikeln über das Online-Bestellformular
ab Jg. 2022: Lesesaal (EG)

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Article ; Online: Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health.

ArXiv

2023

Abstract	ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.
Language	English
Publishing date	2023-10-17
Publishing country	United States
Document type	Preprint
ISSN	2331-8422
ISSN (online)	2331-8422
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: MedCPT

Jin, Qiao / Kim, Won / Chen, Qingyu / Comeau, Donald C. / Yeganova, Lana / Wilbur, W. John / Lu, Zhiyong

Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

2023

Abstract: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query- ... ...

Abstract	Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely-integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks. Comment: The MedCPT code and API are available at https://github.com/ncbi/MedCPT
Keywords	Computer Science - Information Retrieval ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Quantitative Biology - Quantitative Methods
Subject code	004
Publishing date	2023-07-02
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Better synonyms for enriching biomedical search.

Yeganova, Lana / Kim, Sun / Chen, Qingyu / Balasanov, Grigory / Wilbur, W John / Lu, Zhiyong

Journal of the American Medical Informatics Association : JAMIA

2020 Volume 27, Issue 12, Page(s) 1894–1902

Abstract: Objective: In a biomedical literature search, the link between a query and a document is often not established, because they use different terms to refer to the same concept. Distributional word embeddings are frequently used for detecting related words ...

Abstract	Objective: In a biomedical literature search, the link between a query and a document is often not established, because they use different terms to refer to the same concept. Distributional word embeddings are frequently used for detecting related words by computing the cosine similarity between them. However, previous research has not established either the best embedding methods for detecting synonyms among related word pairs or how effective such methods may be. Materials and methods: In this study, we first create the BioSearchSyn set, a manually annotated set of synonyms, to assess and compare 3 widely used word-embedding methods (word2vec, fastText, and GloVe) in their ability to detect synonyms among related pairs of words. We demonstrate the shortcomings of the cosine similarity score between word embeddings for this task: the same scores have very different meanings for the different methods. To address the problem, we propose utilizing pool adjacent violators (PAV), an isotonic regression algorithm, to transform a cosine similarity into a probability of 2 words being synonyms. Results: Experimental results using the BioSearchSyn set as a gold standard reveal which embedding methods have the best performance in identifying synonym pairs. The BioSearchSyn set also allows converting cosine similarity scores into probabilities, which provides a uniform interpretation of the synonymy score over different methods. Conclusions: We introduced the BioSearchSyn corpus of 1000 term pairs, which allowed us to identify the best embedding method for detecting synonymy for biomedical search. Using the proposed method, we created PubTermVariants2.0: a large, automatically extracted set of synonym pairs that have augmented PubMed searches since the spring of 2019.
MeSH term(s)	Algorithms ; Biomedical Research ; Information Storage and Retrieval/methods ; Linguistics ; Probability ; PubMed ; Terminology as Topic
Language	English
Publishing date	2020-10-15
Publishing country	England
Document type	Journal Article ; Research Support, N.I.H., Intramural
ZDB-ID	1205156-1
ISSN	1527-974X ; 1067-5027
ISSN (online)	1527-974X
ISSN	1067-5027
DOI	10.1093/jamia/ocaa151
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 4128: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (2.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MO 312: Show issues

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Article: PDC - a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed.

Islamaj, Rezarta / Yeganova, Lana / Kim, Won / Xie, Natalie / Wilbur, W John / Lu, Zhiyong

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

2020 Volume 2020, Page(s) 259–268

Abstract: The need to organize a large collection in a manner that facilitates human comprehension is crucial given the ever-increasing volumes of information. In this work, we present PDC (probabilistic distributional clustering), a novel algorithm that, given a ... ...

Abstract	The need to organize a large collection in a manner that facilitates human comprehension is crucial given the ever-increasing volumes of information. In this work, we present PDC (probabilistic distributional clustering), a novel algorithm that, given a document collection, computes disjoint term sets representing topics in the collection. The algorithm relies on probabilities of word co-occurrences to partition the set of terms appearing in the collection of documents into disjoint groups of related terms. In this work, we also present an environment to visualize the computed topics in the term space and retrieve the most related PubMed articles for each group of terms. We illustrate the algorithm by applying it to PubMed documents on the topic of suicide. Suicide is a major public health problem identified as the tenth leading cause of death in the US. In this application, our goal is to provide a global view of the mental health literature pertaining to the subject of suicide, and through this, to help create a rich environment of multifaceted data to guide health care researchers in their endeavor to better understand the breadth, depth and scope of the problem. We demonstrate the usefulness of the proposed algorithm by providing a web portal that allows mental health researchers to peruse the suicide-related literature in PubMed.
Language	English
Publishing date	2020-05-30
Publishing country	United States
Document type	Journal Article
ZDB-ID	2676378-3
ISSN	2153-4063
ISSN	2153-4063
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: Topics in machine learning for biomedical literature analysis and text retrieval.

Islamaj Doğan, Rezarta / Yeganova, Lana

Journal of biomedical semantics

2012 Volume 3 Suppl 3, Page(s) S1

Language	English
Publishing date	2012-10-05
Publishing country	England
Document type	Journal Article
ZDB-ID	2548651-2
ISSN	2041-1480 ; 2041-1480
ISSN (online)	2041-1480
ISSN	2041-1480
DOI	10.1186/2041-1480-3-S3-S1
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: Evolving use of ancestry, ethnicity, and race in genetics research-A survey spanning seven decades.

Byeon, Yen Ji Julia / Islamaj, Rezarta / Yeganova, Lana / Wilbur, W John / Lu, Zhiyong / Brody, Lawrence C / Bonham, Vence L

American journal of human genetics

2021 Volume 108, Issue 12, Page(s) 2215–2223

Abstract: To inform continuous and rigorous reflection about the description of human populations in genomics research, this study investigates the historical and contemporary use of the terms "ancestry," "ethnicity," "race," and other population labels in The ... ...

Abstract	To inform continuous and rigorous reflection about the description of human populations in genomics research, this study investigates the historical and contemporary use of the terms "ancestry," "ethnicity," "race," and other population labels in The American Journal of Human Genetics from 1949 to 2018. We characterize these terms' frequency of use and assess their odds of co-occurrence with a set of social and genetic topical terms. Throughout The Journal's 70-year history, "ancestry" and "ethnicity" have increased in use, appearing in 33% and 26% of articles in 2009-2018, while the use of "race" has decreased, occurring in 4% of articles in 2009-2018. Although its overall use has declined, the odds of "race" appearing in the presence of "ethnicity" has increased relative to the odds of occurring in its absence. Forms of population descriptors "Caucasian" and "Negro" have largely disappeared from The Journal (<1% of articles in 2009-2018). Conversely, the continental labels "African," "Asian," and "European" have increased in use and appear in 18%, 14%, and 42% of articles from 2009-2018, respectively. Decreasing uses of the terms "race," "Caucasian," and "Negro" are indicative of a transition away from the field's history of explicitly biological race science; at the same time, the increasing use of "ancestry," "ethnicity," and continental labels should serve to motivate ongoing reflection as the terminology used to describe genetic variation continues to evolve.
MeSH term(s)	Ethnicity ; Genetic Research/history ; History, 20th Century ; History, 21st Century ; Human Genetics/history ; Human Genetics/trends ; Humans ; Publishing/history ; Racial Groups
Language	English
Publishing date	2021-12-02
Publishing country	United States
Document type	Historical Article ; Journal Article ; Research Support, N.I.H., Intramural
ZDB-ID	219384-x
ISSN	1537-6605 ; 0002-9297
ISSN (online)	1537-6605
ISSN	0002-9297
DOI	10.1016/j.ajhg.2021.10.008
Database	MEDical Literature Analysis and Retrieval System OnLINE

Full text online

Accessible to users with ZB MED library card

In stock of ZB MED Cologne/Königswinter

Zs.A 107: Show issues

Location:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular
Jg. 1995 - 2021: Lesesall (1.OG)
ab Jg. 2022: Lesesaal (EG)

Order via subito

Details ▾

Article ; Online: Discovering themes in biomedical literature using a projection-based algorithm.

Yeganova, Lana / Kim, Sun / Balasanov, Grigory / Wilbur, W John

BMC bioinformatics

2018 Volume 19, Issue 1, Page(s) 269

Abstract: Background: The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a broad overview of the information ... ...

Abstract	Background: The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a broad overview of the information space are document clustering and topic modeling. Clustering aims to group documents or terms into meaningful clusters. Topic modeling, on the other hand, focuses on finding coherent keywords for describing topics appearing in a set of documents. In addition, there have been efforts for clustering documents and finding keywords simultaneously. Results: We present an algorithm to analyze document collections that is based on a notion of a theme, defined as a dual representation based on a set of documents and key terms. In this work, a novel vector space mechanism is proposed for computing themes. Starting with a single document, the theme algorithm treats terms and documents as explicit components, and iteratively uses each representation to refine the other until the theme is detected. The method heavily relies on an optimization routine that we refer to as the projection algorithm which, under specific conditions, is guaranteed to converge to the first singular vector of a data matrix. We apply our algorithm to a collection of about sixty thousand PubMed Conclusions: This study presents a contribution on theoretical and algorithmic levels, as well as demonstrates the feasibility of the method for large scale applications. The evaluation of our system on benchmark datasets demonstrates that our method compares favorably with the current state-of-the-art methods in computing clusters of documents with coherent topic terms.
MeSH term(s)	Algorithms ; Cluster Analysis ; Databases, Genetic ; Humans ; Polymorphism, Single Nucleotide/genetics ; Publications
Language	English
Publishing date	2018-07-16
Publishing country	England
Document type	Journal Article
ZDB-ID	2041484-5
ISSN	1471-2105 ; 1471-2105
ISSN (online)	1471-2105
ISSN	1471-2105
DOI	10.1186/s12859-018-2240-0
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

To top

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

More links

Kategorien

Order via subito

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

More links

Kategorien

Order via subito