LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 33

Search options

  1. Article: Prediction and outlier detection in classification problems.

    Guan, Leying / Tibshirani, Robert

    Journal of the Royal Statistical Society. Series B, Statistical methodology

    2022  Volume 84, Issue 2, Page(s) 524–546

    Abstract: We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction ...

    Abstract We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set
    Language English
    Publishing date 2022-02-15
    Publishing country England
    Document type Journal Article
    ZDB-ID 1490719-7
    ISSN 1467-9868 ; 1369-7412 ; 0035-9246
    ISSN (online) 1467-9868
    ISSN 1369-7412 ; 0035-9246
    DOI 10.1111/rssb.12443
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: Prediction and outlier detection in classification problems

    Guan, Leying / Tibshirani, Robert

    Journal of the Royal Statistical Society. 2022 Apr., v. 84, no. 2

    2022  

    Abstract: We consider the multi‐class classification problem when the training data and the out‐of‐sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction ...

    Abstract We consider the multi‐class classification problem when the training data and the out‐of‐sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out‐of‐sample performance, aiming to include the correct class and to detect outliers x as often as possible. BCOPS returns no prediction (corresponding to C(x) equal to the empty set) if it infers x to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out‐of‐sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.
    Keywords algorithms ; journals ; prediction ; sampling ; testing
    Language English
    Dates of publication 2022-04
    Size p. 524-546.
    Publishing place John Wiley & Sons, Ltd
    Document type Article
    Note JOURNAL ARTICLE
    ZDB-ID 1490719-7
    ISSN 1467-9868 ; 0035-9246 ; 1369-7412
    ISSN (online) 1467-9868
    ISSN 0035-9246 ; 1369-7412
    DOI 10.1111/rssb.12443
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  3. Article ; Online: Predictive overfitting in immunological applications: Pitfalls and solutions.

    Gygi, Jeremy P / Kleinstein, Steven H / Guan, Leying

    Human vaccines & immunotherapeutics

    2023  Volume 19, Issue 2, Page(s) 2251830

    Abstract: Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting ...

    Abstract Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
    MeSH term(s) Machine Learning ; Vaccination
    Language English
    Publishing date 2023-09-12
    Publishing country United States
    Document type Journal Article ; Review ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2664176-8
    ISSN 2164-554X ; 2164-5515
    ISSN (online) 2164-554X
    ISSN 2164-5515
    DOI 10.1080/21645515.2023.2251830
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: GAN-based Vertical Federated Learning for Label Protection in Binary Classification

    Han, Yujin / Guan, Leying

    2023  

    Abstract: Split learning (splitNN) has emerged as a popular strategy for addressing the high computational costs and low modeling efficiency in Vertical Federated Learning (VFL). However, despite its popularity, vanilla splitNN lacks encryption protection, leaving ...

    Abstract Split learning (splitNN) has emerged as a popular strategy for addressing the high computational costs and low modeling efficiency in Vertical Federated Learning (VFL). However, despite its popularity, vanilla splitNN lacks encryption protection, leaving it vulnerable to privacy leakage issues, especially Label Leakage from Gradients (LLG). Motivated by the LLG issue resulting from the use of labels during training, we propose the Generative Adversarial Federated Model (GAFM), a novel method designed specifically to enhance label privacy protection by integrating splitNN with Generative Adversarial Networks (GANs). GAFM leverages GANs to indirectly utilize label information by learning the label distribution rather than relying on explicit labels, thereby mitigating LLG. GAFM also employs an additional cross-entropy loss based on the noisy labels to further improve the prediction accuracy. Our ablation experiment demonstrates that the combination of GAN and the cross-entropy loss component is necessary to enable GAFM to mitigate LLG without significantly compromising the model utility. Empirical results on various datasets show that GAFM achieves a better and more robust trade-off between model utility and privacy compared to all baselines across multiple random runs. In addition, we provide experimental justification to substantiate GAFM's superiority over splitNN, demonstrating that it offers enhanced label protection through gradient perturbation relative to splitNN.
    Keywords Computer Science - Machine Learning
    Subject code 006
    Publishing date 2023-02-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: A supervised bayesian factor model for the identification of multi-omics signatures.

    Gygi, Jeremy P / Konstorum, Anna / Pawar, Shrikant / Aron, Edel / Kleinstein, Steven H / Guan, Leying

    Bioinformatics (Oxford, England)

    2024  

    Abstract: Motivation: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a ...

    Abstract Motivation: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful.
    Results: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of COVID-19 severity and breast cancer tumor subtypes.
    Availability: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    Language English
    Publishing date 2024-04-11
    Publishing country England
    Document type Journal Article
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btae202
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article: Leveraging genetic correlations and multiple populations to improve genetic risk prediction for non-European populations.

    Xu, Leqi / Zhou, Geyu / Jiang, Wei / Guan, Leying / Zhao, Hongyu

    Research square

    2023  

    Abstract: The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to ... ...

    Abstract The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian populations, nine quantitative traits and one binary trait in African populations, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.
    Language English
    Publishing date 2023-12-25
    Publishing country United States
    Document type Preprint
    DOI 10.21203/rs.3.rs-3741763/v1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: Leveraging genetic correlations and multiple populations to improve genetic risk prediction for non-European populations.

    Xu, Leqi / Zhou, Geyu / Jiang, Wei / Guan, Leying / Zhao, Hongyu

    bioRxiv : the preprint server for biology

    2023  

    Abstract: The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to ... ...

    Abstract The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian populations, nine quantitative traits and one binary trait in African populations, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.
    Language English
    Publishing date 2023-12-12
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.10.29.564615
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Supervised learning via the "hubNet" procedure.

    Guan, Leying / Fan, Zhou / Tibshirani, Robert

    Statistica Sinica

    2022  Volume 28, Issue 3, Page(s) 1225–1243

    Abstract: We propose a new method for supervised learning. ... ...

    Abstract We propose a new method for supervised learning. The
    Language English
    Publishing date 2022-06-01
    Publishing country China (Republic : 1949- )
    Document type Journal Article
    ISSN 1017-0405
    ISSN 1017-0405
    DOI 10.5705/ss.202016.0482
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article: Post model-fitting exploration via a "Next-Door" analysis.

    Guan, Leying / Tibshirani, Robert

    The Canadian journal of statistics = Revue canadienne de statistique

    2020  Volume 48, Issue 3, Page(s) 447–470

    Abstract: We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the ...

    Abstract We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the chosen "base model," and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure "Next-Door analysis" since it examines models "next" to the base model. It can be applied to supervised learning problems with
    Language English
    Publishing date 2020-03-05
    Publishing country Canada
    Document type Journal Article
    ZDB-ID 2007833-X
    ISSN 1708-945X ; 0319-5724
    ISSN (online) 1708-945X
    ISSN 0319-5724
    DOI 10.1002/cjs.11542
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Book ; Online: Conformalized semi-supervised random forest for classification and abnormality detection

    Han, Yujin / Xu, Mingwenchan / Guan, Leying

    2023  

    Abstract: Traditional classifiers infer labels under the premise that the training and test samples are generated from the same distribution. This assumption can be problematic for safety-critical applications such as medical diagnosis and network attack detection. ...

    Abstract Traditional classifiers infer labels under the premise that the training and test samples are generated from the same distribution. This assumption can be problematic for safety-critical applications such as medical diagnosis and network attack detection. In this paper, we consider the multi-class classification problem when the training data and the test data may have different distributions. We propose conformalized semi-supervised random forest (CSForest), which constructs set-valued predictions $C(x)$ to include the correct class label with desired probability while detecting outliers efficiently. We compare the proposed method to other state-of-art methods in both a synthetic example and a real data application to demonstrate the strength of our proposal.
    Keywords Computer Science - Machine Learning
    Publishing date 2023-02-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top