LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 55

Search options

  1. Article: Word centrality constrained representation for keyphrase extraction.

    Gero, Zelalem / Ho, Joyce C

    Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

    2022  Volume 2021, Page(s) 155–161

    Abstract: To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient ... ...

    Abstract To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient concepts in a document. Various supervised approaches model keyphrase extraction using local context to predict the label for each token and perform much better than the unsupervised counterparts. Unfortunately, this method fails for short documents where the context is unclear. Moreover, keyphrases, which are usually the gist of a document, need to be the central theme. We propose a new extraction model that introduces a centrality constraint to enrich the word representation of a Bidirectional long short-term memory. Performance evaluation on two publicly available datasets demonstrate our model outperforms existing state-of-the art approaches. Our model is publicly available at https://github.com/ZHgero/keyphrases_centrality.git.
    Language English
    Publishing date 2022-06-24
    Publishing country United States
    Document type Journal Article
    DOI 10.18653/v1/2021.bionlp-1.17
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Book ; Online: PGB

    Lee, Eric W / Ho, Joyce C

    A PubMed Graph Benchmark for Heterogeneous Network Representation Learning

    2023  

    Abstract: There has been rapid growth in biomedical literature, yet capturing the heterogeneity of the bibliographic information of these articles remains relatively understudied. Although graph mining research via heterogeneous graph neural networks has taken ... ...

    Abstract There has been rapid growth in biomedical literature, yet capturing the heterogeneity of the bibliographic information of these articles remains relatively understudied. Although graph mining research via heterogeneous graph neural networks has taken center stage, it remains unclear whether these approaches capture the heterogeneity of the PubMed database, a vast digital repository containing over 33 million articles. We introduce PubMed Graph Benchmark (PGB), a new benchmark dataset for evaluating heterogeneous graph embeddings for biomedical literature. The benchmark contains rich metadata including abstract, authors, citations, MeSH terms, MeSH hierarchy, and some other information. The benchmark contains three different evaluation tasks encompassing systematic reviews, node classification, and node clustering. In PGB, we aggregate the metadata associated with the biomedical articles from PubMed into a unified source and make the benchmark publicly available for any future works.
    Keywords Computer Science - Machine Learning ; Computer Science - Social and Information Networks
    Subject code 006
    Publishing date 2023-05-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article: Uncertainty-based Self-training for Biomedical Keyphrase Extraction.

    Gero, Zelalem / Ho, Joyce C

    ... IEEE-EMBS International Conference on Biomedical and Health Informatics. IEEE-EMBS International Conference on Biomedical and Health Informatics

    2021  Volume 2021

    Abstract: To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient ... ...

    Abstract To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient concepts in a document. Various supervised approaches model keyphrase extraction using local context to predict the label for each token and perform much better than the unsupervised counterparts. However, existing supervised datasets have limited annotated examples to train better deep learning models. In contrast, many domains have large amount of un-annotated data that can be leveraged to improve model performance in keyphrase extraction. We introduce a self-learning based model that incorporates uncertainty estimates to select instances from large-scale unlabeled data to augment the small labeled training set. Performance evaluation on a publicly available biomedical dataset demonstrates that our method improves performance of keyphrase extraction over state of the art models.
    Language English
    Publishing date 2021-08-10
    Publishing country United States
    Document type Journal Article
    ISSN 2641-3590
    ISSN 2641-3590
    DOI 10.1109/bhi50953.2021.9508592
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: CATAN: Chart-aware temporal attention network for adverse outcome prediction.

    Gero, Zelalem / Ho, Joyce C

    IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

    2021  Volume 2021, Page(s) 83–92

    Abstract: There is an increased adoption of electronic health record systems by a variety of hospitals and medical centers. This provides an opportunity to leverage automated computer systems in assisting healthcare workers. One of the least utilized but rich ... ...

    Abstract There is an increased adoption of electronic health record systems by a variety of hospitals and medical centers. This provides an opportunity to leverage automated computer systems in assisting healthcare workers. One of the least utilized but rich source of patient information is the unstructured clinical text. In this work, we develop CATAN, a chart-aware temporal attention network for learning patient representations from clinical notes. We introduce a novel representation where each note is considered a single unit, like a sentence, and composed of attention-weighted words. The notes in turn are aggregated into a patient representation using a second weighting unit, note attention. Unlike standard attention computations which focus only on the content of the note, we incorporate the chart-time for each note as a constraint for attention calculation. This allows our model to focus on notes closer to the prediction time. Using the MIMIC-III dataset, we empirically show that our patient representation and attention calculation achieves the best performance in comparison with various state-of-the-art baselines for one-year mortality prediction and 30-day hospital readmission. Moreover, the attention weights can be used to offer transparency into our model's predictions.
    Language English
    Publishing date 2021-10-15
    Publishing country United States
    Document type Journal Article
    ISSN 2575-2626
    ISSN 2575-2626
    DOI 10.1109/ichi52183.2021.00024
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Evaluating Natural Language Processing Packages for Predicting Hospital-Acquired Pressure Injuries From Clinical Notes.

    Gu, Siyi / Lee, Eric W / Zhang, Wenhui / Simpson, Roy L / Hertzberg, Vicki Stover / Ho, Joyce C

    Computers, informatics, nursing : CIN

    2024  Volume 42, Issue 3, Page(s) 184–192

    Abstract: Incidence of hospital-acquired pressure injury, a key indicator of nursing quality, is directly proportional to adverse outcomes, increased hospital stays, and economic burdens on patients, caregivers, and society. Thus, predicting hospital-acquired ... ...

    Abstract Incidence of hospital-acquired pressure injury, a key indicator of nursing quality, is directly proportional to adverse outcomes, increased hospital stays, and economic burdens on patients, caregivers, and society. Thus, predicting hospital-acquired pressure injury is important. Prediction models use structured data more often than unstructured notes, although the latter often contain useful patient information. We hypothesize that unstructured notes, such as nursing notes, can predict hospital-acquired pressure injury. We evaluate the impact of using various natural language processing packages to identify salient patient information from unstructured text. We use named entity recognition to identify keywords, which comprise the feature space of our classifier for hospital-acquired pressure injury prediction. We compare scispaCy and Stanza, two different named entity recognition models, using unstructured notes in Medical Information Mart for Intensive Care III, a publicly available ICU data set. To assess the impact of vocabulary size reduction, we compare the use of all clinical notes with only nursing notes. Our results suggest that named entity recognition extraction using nursing notes can yield accurate models. Moreover, the extracted keywords play a significant role in the prediction of hospital-acquired pressure injury.
    MeSH term(s) Humans ; Natural Language Processing ; Pressure Ulcer/diagnosis ; Critical Care ; Hospitals
    Language English
    Publishing date 2024-03-01
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2078463-6
    ISSN 1538-9774 ; 1538-2931
    ISSN (online) 1538-9774
    ISSN 1538-2931
    DOI 10.1097/CIN.0000000000001053
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Hypergraph Transformers for EHR-based Clinical Predictions.

    Xu, Ran / Ali, Mohammed K / Ho, Joyce C / Yang, Carl

    AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

    2023  Volume 2023, Page(s) 582–591

    Abstract: Electronic health records (EHR) data contain rich information about patients' health conditions including diagnosis, procedures, medications and etc., which have been widely used to facilitate digital medicine. Despite its importance, it is often non- ... ...

    Abstract Electronic health records (EHR) data contain rich information about patients' health conditions including diagnosis, procedures, medications and etc., which have been widely used to facilitate digital medicine. Despite its importance, it is often non-trivial to learn useful representations for patients' visits that support downstream clinical predictions, as each visit contains massive and diverse medical codes. As a result, the complex interactions among medical codes are often not captured, which leads to substandard predictions. To better model these complex relations, we leverage hypergraphs, which go beyond pairwise relations to jointly learn the representations for visits and medical codes. We also propose to use the self-attention mechanism to automatically identify the most relevant medical codes for each visit based on the downstream clinical predictions with better generalization power. Experiments on two EHR datasets show that our proposed method not only yields superior performance, but also provides reasonable insights towards the target tasks.
    Language English
    Publishing date 2023-06-16
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2676378-3
    ISSN 2153-4063 ; 2153-4063
    ISSN (online) 2153-4063
    ISSN 2153-4063
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark.

    Sheng, Jiasheng / Gero, Zelalem / Ho, Joyce C

    Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

    2022  Volume 2022, Page(s) 4470–4474

    Abstract: With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure ... ...

    Abstract With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure keywords and the lack of a comprehensive benchmark. PubMedAKE is an author-assigned keyword extraction dataset that contains the title, abstract, and keywords of over 843,269 articles from the PubMed open access subset database. This dataset, publicly available on Zenodo, is the largest keyword extraction benchmark with sufficient samples to train neural networks. Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.
    Language English
    Publishing date 2022-10-17
    Publishing country United States
    Document type Journal Article
    ISSN 2155-0751
    ISSN 2155-0751
    DOI 10.1145/3511808.3557675
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: CaliForest: Calibrated Random Forest for Health Data.

    Park, Yubin / Ho, Joyce C

    Proceedings of the ACM Conference on Health, Inference, and Learning

    2020  Volume 2020, Page(s) 40–50

    Abstract: Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often ... ...

    Abstract Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often neglected and only discrimination is analyzed. Calibration is crucial for personalized medicine as they play an increasing role in the decision making process. Since random forest is a popular model for many healthcare applications, we propose CaliForest, a new calibrated random forest. Unlike existing calibration methodologies, CaliForest utilizes the out-of-bag samples to avoid the explicit construction of a calibration set. We evaluated CaliForest on two risk prediction tasks obtained from the publicly-available MIMIC-III database. Evaluation on these binary prediction tasks demonstrates that CaliForest can achieve the same discriminative power as random forest while obtaining a better-calibrated model evaluated across six different metrics. CaliForest is published on the standard Python software repository and the code is openly available on Github.
    Language English
    Publishing date 2020-04-02
    Document type Journal Article
    DOI 10.1145/3368555.3384461
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article: FuzzyGap: Sequential Pattern Mining for Predicting Chronic Heart Failure in Clinical Pathways.

    Lee, Eric W / Ho, Joyce C

    AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

    2019  Volume 2019, Page(s) 222–231

    Abstract: The rapid growth of electronic health records (EHRs) facilitates the use of clinical pathways, an actionable plan for patients which is represented as sequences of diagnostic records ordered by visit dates. We propose to extract discriminative and ... ...

    Abstract The rapid growth of electronic health records (EHRs) facilitates the use of clinical pathways, an actionable plan for patients which is represented as sequences of diagnostic records ordered by visit dates. We propose to extract discriminative and representative clinical pathways from EHRs using sequential pattern mining. However, existing sequential patterns cannot efficiently extract patterns due to patient variations in length and time period between visits. To resolve this problem, we propose FuzzyGap, a sequential pattern mining-based framework that extracts a discriminative subsequent pattern from the proper representation of the sequence of encounters which also emphasizes the last visit that is more significant than others. We demonstrate FuzzyGap using a case study of heart failure and show the effectiveness of sequential pattern mining.
    Language English
    Publishing date 2019-05-06
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2676378-3
    ISSN 2153-4063
    ISSN 2153-4063
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: Improving length of stay prediction using a hidden Markov model.

    Sotoodeh, Mani / Ho, Joyce C

    AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

    2019  Volume 2019, Page(s) 425–434

    Abstract: Estimating length of stay of intensive care unit patients is crucial to reducing health care costs. This can help physicians intervene at the right time to prevent adverse outcomes for the patients. Moreover, resource allocation can be optimized to ... ...

    Abstract Estimating length of stay of intensive care unit patients is crucial to reducing health care costs. This can help physicians intervene at the right time to prevent adverse outcomes for the patients. Moreover, resource allocation can be optimized to ensure appropriate hospital staff levels. Yet the length of stay prediction is very hard, as physicians can only accurately estimate half of their patient population. As electronic health records have become more prevalent, researchers can harness the power of machine learning to accurately predict the length of stay. We propose a hidden Markov model-based framework to predict the length of stay using some of patients' physiological measurements during the first 48 hours of their admission to the intensive care unit. We show that this model can succinctly capture temporal patient representations. We demonstrate the potential of our framework on real ICU data in consistently outperforming most of the existing baselines.
    Language English
    Publishing date 2019-05-06
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2676378-3
    ISSN 2153-4063
    ISSN 2153-4063
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top