LIVIVO - Das Suchportal für Lebenswissenschaften

switch to English language
Erweiterte Suche

Suchergebnis

Treffer 1 - 10 von insgesamt 23

Suchoptionen

  1. Artikel: Transcriptome Complexity Disentangled: A Regulatory Molecules Approach.

    Asiaee, Amir / Abrams, Zachary B / Coombes, Kevin R

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Gene regulatory networks play a critical role in understanding cell states, gene expression, and biological processes. Here, we investigated the utility of transcription factors (TFs) and microRNAs (miRNAs) in creating a low-dimensional representation of ...

    Abstract Gene regulatory networks play a critical role in understanding cell states, gene expression, and biological processes. Here, we investigated the utility of transcription factors (TFs) and microRNAs (miRNAs) in creating a low-dimensional representation of cell states and predicting gene expression across 31 cancer types. We identified 28 clusters of miRNAs and 28 clusters of TFs, demonstrating that they can differentiate tissue of origin. Using a simple SVM classifier, we achieved an average accuracy of 92.8% in tissue classification. We also predicted the entire transcriptome using Tissue-Agnostic and Tissue-Aware models, with average
    Sprache Englisch
    Erscheinungsdatum 2023-04-21
    Erscheinungsland United States
    Dokumenttyp Preprint
    DOI 10.1101/2023.04.17.537241
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  2. Artikel: SillyPutty: Improved clustering by optimizing the silhouette width.

    Bombina, Polina / Tally, Dwayne / Abrams, Zachary B / Coombes, Kevin R

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we ...

    Abstract Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.
    Sprache Englisch
    Erscheinungsdatum 2023-11-11
    Erscheinungsland United States
    Dokumenttyp Preprint
    DOI 10.1101/2023.11.07.566055
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  3. Artikel: Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy's Rule-based & Machine Learning-based methods.

    Bhattarai, Kriti / Oh, Inez Y / Sierra, Jonathan Moran / Tang, Jonathan / Payne, Philip R O / Abrams, Zachary B / Lai, Albert M

    bioRxiv : the preprint server for biology

    2024  

    Abstract: Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of ... ...

    Abstract Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy.
    Materials and methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores.
    Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance.
    Discussion and conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.
    Sprache Englisch
    Erscheinungsdatum 2024-04-06
    Erscheinungsland United States
    Dokumenttyp Preprint
    DOI 10.1101/2023.09.27.559788
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  4. Artikel ; Online: RCytoGPS: an R package for reading and visualizing cytogenetics data.

    Abrams, Zachary B / Tally, Dwayne G / Abruzzo, Lynne V / Coombes, Kevin R

    Bioinformatics (Oxford, England)

    2021  Band 37, Heft 23, Seite(n) 4589–4590

    Abstract: Summary: Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these ... ...

    Abstract Summary: Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology.
    Availability and implementation: Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS.
    Mesh-Begriff(e) Humans ; Reading ; Karyotyping ; Software ; Karyotype
    Sprache Englisch
    Erscheinungsdatum 2021-09-27
    Erscheinungsland England
    Dokumenttyp Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btab683
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  5. Artikel ; Online: Mercator: a pipeline for multi-method, unsupervised visualization and distance generation.

    Abrams, Zachary B / Coombes, Caitlin E / Li, Suli / Coombes, Kevin R

    Bioinformatics (Oxford, England)

    2021  Band 37, Heft 17, Seite(n) 2780–2781

    Abstract: Summary: Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate ... ...

    Abstract Summary: Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics.
    Availabilityand implementation: Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).
    Sprache Englisch
    Erscheinungsdatum 2021-01-08
    Erscheinungsland England
    Dokumenttyp Journal Article
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btab037
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  6. Artikel ; Online: Electronic health record data quality assessment and tools: a systematic review.

    Lewis, Abigail E / Weiskopf, Nicole / Abrams, Zachary B / Foraker, Randi / Lai, Albert M / Payne, Philip R O / Gupta, Aditi

    Journal of the American Medical Informatics Association : JAMIA

    2023  Band 30, Heft 10, Seite(n) 1730–1740

    Abstract: Objective: We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies.: Materials and methods: We ... ...

    Abstract Objective: We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies.
    Materials and methods: We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process.
    Results: We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology.
    Discussion: There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality.
    Conclusion: Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process.
    Mesh-Begriff(e) Electronic Health Records ; Data Accuracy
    Sprache Englisch
    Erscheinungsdatum 2023-06-29
    Erscheinungsland England
    Dokumenttyp Systematic Review ; Journal Article
    ZDB-ID 1205156-1
    ISSN 1527-974X ; 1067-5027
    ISSN (online) 1527-974X
    ISSN 1067-5027
    DOI 10.1093/jamia/ocad120
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  7. Artikel ; Online: Simulation-derived best practices for clustering clinical data.

    Coombes, Caitlin E / Liu, Xin / Abrams, Zachary B / Coombes, Kevin R / Brock, Guy

    Journal of biomedical informatics

    2021  Band 118, Seite(n) 103788

    Abstract: Introduction: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and ... ...

    Abstract Introduction: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data.
    Methods: We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit.
    Results: HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets.
    Discussion: Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.
    Mesh-Begriff(e) Algorithms ; Cluster Analysis ; Computer Simulation ; Humans ; Leukemia, Lymphocytic, Chronic, B-Cell
    Sprache Englisch
    Erscheinungsdatum 2021-04-20
    Erscheinungsland United States
    Dokumenttyp Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2057141-0
    ISSN 1532-0480 ; 1532-0464
    ISSN (online) 1532-0480
    ISSN 1532-0464
    DOI 10.1016/j.jbi.2021.103788
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  8. Artikel ; Online: Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.

    Coombes, Caitlin E / Abrams, Zachary B / Li, Suli / Abruzzo, Lynne V / Coombes, Kevin R

    Journal of the American Medical Informatics Association : JAMIA

    2020  Band 27, Heft 7, Seite(n) 1019–1027

    Abstract: Objective: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological ... ...

    Abstract Objective: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes.
    Methods: To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves.
    Results: In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age.
    Conclusions: This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity.
    Mesh-Begriff(e) Adult ; Aged ; Aged, 80 and over ; Female ; Humans ; Immunoglobulin Heavy Chains/genetics ; Kaplan-Meier Estimate ; Leukemia, Lymphocytic, Chronic, B-Cell/immunology ; Leukemia, Lymphocytic, Chronic, B-Cell/metabolism ; Leukemia, Lymphocytic, Chronic, B-Cell/mortality ; Male ; Middle Aged ; Mutation ; Prognosis ; Proportional Hazards Models ; Unsupervised Machine Learning ; ZAP-70 Protein-Tyrosine Kinase/metabolism
    Chemische Substanzen Immunoglobulin Heavy Chains ; ZAP-70 Protein-Tyrosine Kinase (EC 2.7.10.2) ; ZAP70 protein, human (EC 2.7.10.2)
    Sprache Englisch
    Erscheinungsdatum 2020-06-01
    Erscheinungsland England
    Dokumenttyp Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 1205156-1
    ISSN 1527-974X ; 1067-5027
    ISSN (online) 1527-974X
    ISSN 1067-5027
    DOI 10.1093/jamia/ocaa060
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  9. Artikel ; Online: Thresher: determining the number of clusters while removing outliers.

    Wang, Min / Abrams, Zachary B / Kornblau, Steven M / Coombes, Kevin R

    BMC bioinformatics

    2018  Band 19, Heft 1, Seite(n) 9

    Abstract: Background: Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. ... ...

    Abstract Background: Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier problem. Furthermore, existing methods for identifying the number of clusters still have some drawbacks. Thus, there is a need for a better algorithm to tackle both challenges.
    Results: We present a new approach, implemented in an R package called Thresher, to cluster objects in general datasets. Thresher combines ideas from principal component analysis, outlier filtering, and von Mises-Fisher mixture models in order to select the optimal number of clusters. We performed a large Monte Carlo simulation study to compare Thresher with other methods for detecting outliers and determining the number of clusters. We found that Thresher had good sensitivity and specificity for detecting and removing outliers. We also found that Thresher is the best method for estimating the optimal number of clusters when the number of objects being clustered is smaller than the number of variables used for clustering. Finally, we applied Thresher and eleven other methods to 25 sets of breast cancer data downloaded from the Gene Expression Omnibus; only Thresher consistently estimated the number of clusters to lie in the range of 4-7 that is consistent with the literature.
    Conclusions: Thresher is effective at automatically detecting and removing outliers. By thus cleaning the data, it produces better estimates of the optimal number of clusters when there are more variables than objects. When we applied Thresher to a variety of breast cancer datasets, it produced estimates that were both self-consistent and consistent with the literature. We expect Thresher to be useful for studying a wide variety of biological datasets.
    Mesh-Begriff(e) Algorithms ; Breast Neoplasms/metabolism ; Breast Neoplasms/pathology ; Cluster Analysis ; Female ; Humans ; Monte Carlo Method ; Principal Component Analysis
    Sprache Englisch
    Erscheinungsdatum 2018-01-08
    Erscheinungsland England
    Dokumenttyp Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-017-1998-9
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  10. Artikel ; Online: Pattern recognition in lymphoid malignancies using CytoGPS and Mercator.

    Abrams, Zachary B / Tally, Dwayne G / Zhang, Lin / Coombes, Caitlin E / Payne, Philip R O / Abruzzo, Lynne V / Coombes, Kevin R

    BMC bioinformatics

    2021  Band 22, Heft 1, Seite(n) 100

    Abstract: Background: There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. ... ...

    Abstract Background: There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers.
    Results: In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database.
    Conclusion: Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.
    Mesh-Begriff(e) Chromosome Aberrations ; Hematologic Diseases ; Humans ; Karyotype ; Karyotyping ; Neoplasms
    Sprache Englisch
    Erscheinungsdatum 2021-03-01
    Erscheinungsland England
    Dokumenttyp Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-021-03992-1
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

Zum Seitenanfang