LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 2 of total 2

Search options

  1. Article ; Online: Better synonyms for enriching biomedical search.

    Yeganova, Lana / Kim, Sun / Chen, Qingyu / Balasanov, Grigory / Wilbur, W John / Lu, Zhiyong

    Journal of the American Medical Informatics Association : JAMIA

    2020  Volume 27, Issue 12, Page(s) 1894–1902

    Abstract: Objective: In a biomedical literature search, the link between a query and a document is often not established, because they use different terms to refer to the same concept. Distributional word embeddings are frequently used for detecting related words ...

    Abstract Objective: In a biomedical literature search, the link between a query and a document is often not established, because they use different terms to refer to the same concept. Distributional word embeddings are frequently used for detecting related words by computing the cosine similarity between them. However, previous research has not established either the best embedding methods for detecting synonyms among related word pairs or how effective such methods may be.
    Materials and methods: In this study, we first create the BioSearchSyn set, a manually annotated set of synonyms, to assess and compare 3 widely used word-embedding methods (word2vec, fastText, and GloVe) in their ability to detect synonyms among related pairs of words. We demonstrate the shortcomings of the cosine similarity score between word embeddings for this task: the same scores have very different meanings for the different methods. To address the problem, we propose utilizing pool adjacent violators (PAV), an isotonic regression algorithm, to transform a cosine similarity into a probability of 2 words being synonyms.
    Results: Experimental results using the BioSearchSyn set as a gold standard reveal which embedding methods have the best performance in identifying synonym pairs. The BioSearchSyn set also allows converting cosine similarity scores into probabilities, which provides a uniform interpretation of the synonymy score over different methods.
    Conclusions: We introduced the BioSearchSyn corpus of 1000 term pairs, which allowed us to identify the best embedding method for detecting synonymy for biomedical search. Using the proposed method, we created PubTermVariants2.0: a large, automatically extracted set of synonym pairs that have augmented PubMed searches since the spring of 2019.
    MeSH term(s) Algorithms ; Biomedical Research ; Information Storage and Retrieval/methods ; Linguistics ; Probability ; PubMed ; Terminology as Topic
    Language English
    Publishing date 2020-10-15
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Intramural
    ZDB-ID 1205156-1
    ISSN 1527-974X ; 1067-5027
    ISSN (online) 1527-974X
    ISSN 1067-5027
    DOI 10.1093/jamia/ocaa151
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Discovering themes in biomedical literature using a projection-based algorithm.

    Yeganova, Lana / Kim, Sun / Balasanov, Grigory / Wilbur, W John

    BMC bioinformatics

    2018  Volume 19, Issue 1, Page(s) 269

    Abstract: Background: The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a broad overview of the information ... ...

    Abstract Background: The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a broad overview of the information space are document clustering and topic modeling. Clustering aims to group documents or terms into meaningful clusters. Topic modeling, on the other hand, focuses on finding coherent keywords for describing topics appearing in a set of documents. In addition, there have been efforts for clustering documents and finding keywords simultaneously.
    Results: We present an algorithm to analyze document collections that is based on a notion of a theme, defined as a dual representation based on a set of documents and key terms. In this work, a novel vector space mechanism is proposed for computing themes. Starting with a single document, the theme algorithm treats terms and documents as explicit components, and iteratively uses each representation to refine the other until the theme is detected. The method heavily relies on an optimization routine that we refer to as the projection algorithm which, under specific conditions, is guaranteed to converge to the first singular vector of a data matrix. We apply our algorithm to a collection of about sixty thousand PubMed
    Conclusions: This study presents a contribution on theoretical and algorithmic levels, as well as demonstrates the feasibility of the method for large scale applications. The evaluation of our system on benchmark datasets demonstrates that our method compares favorably with the current state-of-the-art methods in computing clusters of documents with coherent topic terms.
    MeSH term(s) Algorithms ; Cluster Analysis ; Databases, Genetic ; Humans ; Polymorphism, Single Nucleotide/genetics ; Publications
    Language English
    Publishing date 2018-07-16
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-018-2240-0
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top