LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 58

Search options

  1. Article ; Online: SMOTE-CD: SMOTE for compositional data.

    Nguyen, Teo / Mengersen, Kerrie / Sous, Damien / Liquet, Benoit

    PloS one

    2023  Volume 18, Issue 6, Page(s) e0287705

    Abstract: Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing ... ...

    Abstract Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Oversampling TEchnique (SMOTE) to deal with compositional data imbalance. The new approach, called SMOTE for Compositional Data (SMOTE-CD), generates synthetic examples by computing a linear combination of selected existing data points, using compositional data operations. The performance of the SMOTE-CD is tested with three different regressors (Gradient Boosting tree, Neural Networks, Dirichlet regressor) applied to two real datasets and to synthetic generated data, and the performance is evaluated using accuracy, cross-entropy, F1-score, R2 score and RMSE. The results show improvements across all metrics, but the impact of oversampling on performance varies depending on the model and the data. In some cases, oversampling may lead to a decrease in performance for the majority class. However, for the real data, the best performance across all models is achieved when oversampling is used. Notably, the F1-score is consistently increased with oversampling. Unlike the original technique, the performance is not improved when combining oversampling of the minority classes and undersampling of the majority class. The Python package smote-cd implements the method and is available online.
    MeSH term(s) Acclimatization ; Benchmarking ; Entropy ; Minority Groups ; Neural Networks, Computer
    Language English
    Publishing date 2023-06-29
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2267670-3
    ISSN 1932-6203 ; 1932-6203
    ISSN (online) 1932-6203
    ISSN 1932-6203
    DOI 10.1371/journal.pone.0287705
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: Improving performances of MCMC for Nearest Neighbor Gaussian Process models with full data augmentation

    Coube-Sisqueille, Sébastien / Liquet, Benoît

    Computational statistics & data analysis. 2022 Apr., v. 168

    2022  

    Abstract: Even though Nearest Neighbor Gaussian Processes (NNGP) alleviate MCMC implementation of Bayesian space-time models considerably, they do not solve the convergence problems caused by high model dimension. Frugal alternatives such as response or collapsed ... ...

    Abstract Even though Nearest Neighbor Gaussian Processes (NNGP) alleviate MCMC implementation of Bayesian space-time models considerably, they do not solve the convergence problems caused by high model dimension. Frugal alternatives such as response or collapsed algorithms are one answer. An alternative approach is to keep full data augmentation, but to try and make it more efficient. Two strategies are presented.The first is to pay particular attention to the seemingly trivial fixed effects of the model. Empirical exploration shows that re-centering the latent field on the intercept critically improves chain behavior. Theoretical elements support those observations. Besides the intercept, other fixed effects may have trouble mixing. This problem is addressed by interweaving, a simple method that requires no tuning, while remaining affordable thanks to the sparsity of NNGPs.The second accelerates sampling of the random field using Chromatic samplers. This method boils long sequential simulation down to group-parallelized or group-vectorized sampling. The attractive possibility for parallelizing NNGP density can therefore be carried over to field sampling.A R implementation of the two methods for Gaussian fields is freely available¹, an extensive vignette is provided. The presented implementation is run on two synthetic toy examples, along with the state of the art package spNNGP. Finally, the methods are applied to a real data set of lead contamination on the mainland of the United States of America.
    Keywords Bayesian theory ; data analysis ; data collection ; lead ; models ; normal distribution ; space and time
    Language English
    Dates of publication 2022-04
    Publishing place Elsevier B.V.
    Document type Article
    ZDB-ID 1478763-5
    ISSN 0167-9473
    ISSN 0167-9473
    DOI 10.1016/j.csda.2021.107368
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  3. Article ; Online: A new method to explicitly estimate the shift of optimum along gradients in multispecies studies

    Mourguiart, Bastien / Liquet, Benoît / Mengersen, Kerrie / Couturier, Thibaut / Mansons, Jérôme / Braud, Yoan / Besnard, Aurélien

    Journal of Biogeography. 2023 May, v. 50, no. 5 p.1000-1011

    2023  

    Abstract: AIM: Optimum shifts in species–environment relationships are intensively studied in a wide range of ecological topics, including climate change and species invasion. Numerous statistical methods are used to study optimum shifts, but, to our knowledge, ... ...

    Abstract AIM: Optimum shifts in species–environment relationships are intensively studied in a wide range of ecological topics, including climate change and species invasion. Numerous statistical methods are used to study optimum shifts, but, to our knowledge, none explicitly estimate it. We extended an existing model to explicitly estimate optimum shifts for multiple species having symmetrical response curves. We called this new Bayesian hierarchical model the Explicit Hierarchical Model of Optimum Shifts (EHMOS). LOCATION: All locations. TAXON: All taxa. METHODS: In a simulation study, we compared the accuracy of EHMOS to a mean comparison method and a Bayesian generalized linear mixed model (GLMM). Specifically, we tested if the accuracy of the methods was sensitive to (1) sampling design, (2) species optimum position and (3) species ecological specialization. In addition, we compared the three methods using a real dataset of investigated optimum shifts in 24 Orthopteran species between two time periods along an elevation gradient. RESULTS: Of all the simulated scenarios, EHMOS was the most accurate method. GLMM was the most sensitive method to species optimum position, providing unreliable estimates in the presence of marginal species, that is, species with an optimum close to a sampling boundary. The mean comparison method was also sensitive to species optimum position and ecological specialization, especially in an unbalanced sampling design, with high negative bias and low interval coverage compared to EHMOS. The case study results obtained with EHMOS were consistent with what is expected considering ongoing climate change, with mostly upward shifts, which further improved confidence in the accuracy of the EHMOS method. MAIN CONCLUSIONS: Explicit Hierarchical Model of Optimum Shifts could be used for a wide range of topics and extended to produce new insights, especially in climate change studies. Explicit estimation of optimum shifts notably allows investigation of ecological assumptions that could explain interspecific variability of these shifts.
    Keywords Bayesian theory ; Orthoptera ; altitude ; biogeography ; case studies ; climate change ; data collection ; interspecific variation ; statistical models
    Language English
    Dates of publication 2023-05
    Size p. 1000-1011.
    Publishing place John Wiley & Sons, Ltd
    Document type Article ; Online
    Note JOURNAL ARTICLE
    ZDB-ID 188963-1
    ISSN 0305-0270
    ISSN 0305-0270
    DOI 10.1111/jbi.14570
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  4. Article ; Online: Leveraging pleiotropic association using sparse group variable selection in genomics data.

    Sutton, Matthew / Sugier, Pierre-Emmanuel / Truong, Therese / Liquet, Benoit

    BMC medical research methodology

    2022  Volume 22, Issue 1, Page(s) 9

    Abstract: Background: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often ... ...

    Abstract Background: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits.
    Methods: We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods.
    Results: Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate.
    Conclusion: We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.
    MeSH term(s) Algorithms ; Genome-Wide Association Study ; Genomics/methods ; Humans ; Phenotype ; Polymorphism, Single Nucleotide
    Language English
    Publishing date 2022-01-07
    Publishing country England
    Document type Journal Article ; Meta-Analysis ; Research Support, Non-U.S. Gov't
    ZDB-ID 2041362-2
    ISSN 1471-2288 ; 1471-2288
    ISSN (online) 1471-2288
    ISSN 1471-2288
    DOI 10.1186/s12874-021-01491-8
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: CPMCGLM: an R package for p-value adjustment when looking for an optimal transformation of a single explanatory variable in generalized linear models.

    Liquet, Benoit / Riou, Jérémie

    BMC medical research methodology

    2019  Volume 19, Issue 1, Page(s) 79

    Abstract: Background: In medical research, explanatory continuous variables are frequently transformed or converted into categorical variables. If the coding is unknown, many tests can be used to identify the "optimal" transformation. This common process, ... ...

    Abstract Background: In medical research, explanatory continuous variables are frequently transformed or converted into categorical variables. If the coding is unknown, many tests can be used to identify the "optimal" transformation. This common process, involving the problems of multiple testing, requires a correction of the significance level. Liquet and Commenges proposed an asymptotic correction of significance level in the context of generalized linear models (GLM) (Liquet and Commenges, Stat Probab Lett 71:33-38, 2005). This procedure has been developed for dichotomous and Box-Cox transformations. Furthermore, Liquet and Riou suggested the use of resampling methods to estimate the significance level for transformations into categorical variables with more than two levels (Liquet and Riou, BMC Med Res Methodol 13:75, 2013).
    Results: CPMCGLM provides to users both methods of p-value adjustment. Futhermore, they are available for a large set of transformations. This paper aims to provide insight the user an overview of the methodological context, and explain in detail the use of the CPMCGLM R package through its application to a real epidemiological dataset.
    Conclusion: We present here the CPMCGLMR package providing efficient methods for the correction of type-I error rate in the context of generalized linear models. This is the first and the only available package in R providing such methods applied to this context. This package is designed to help researchers, who work principally in the field of biostatistics and epidemiology, to analyze their data in the context of optimal cutoff point determination.
    MeSH term(s) Algorithms ; Biometry/methods ; Cholesterol, HDL/blood ; Computational Biology/methods ; Dementia/blood ; Female ; Humans ; Linear Models ; Male ; Reproducibility of Results
    Chemical Substances Cholesterol, HDL
    Language English
    Publishing date 2019-04-16
    Publishing country England
    Document type Journal Article
    ISSN 1471-2288
    ISSN (online) 1471-2288
    DOI 10.1186/s12874-019-0711-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Estimation of semi-Markov multi-state models: a comparison of the sojourn times and transition intensities approaches.

    Asanjarani, Azam / Liquet, Benoit / Nazarathy, Yoni

    The international journal of biostatistics

    2021  Volume 18, Issue 1, Page(s) 243–262

    Abstract: Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be ... ...

    Abstract Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain. On the other hand, intensity transition functions may be used, often referred to as the hazard rates of the semi-Markov process. We summarize and contrast these two parameterizations both from a probabilistic and an inference perspective, and we highlight relationships between the two approaches. In general, the intensity transition based approach allows the likelihood to be split into likelihoods of two-state models having fewer parameters, allowing efficient computation and usage of many survival analysis tools. Nevertheless, in certain cases the sojourn time based approach is natural and has been exploited extensively in applications. In contrasting the two approaches and contemporary relevant R packages used for inference, we use two real datasets highlighting the probabilistic and inference properties of each approach. This analysis is accompanied by an R vignette.
    MeSH term(s) Markov Chains ; Probability ; Reproducibility of Results ; Survival Analysis
    Language English
    Publishing date 2021-01-06
    Publishing country Germany
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1557-4679
    ISSN (online) 1557-4679
    DOI 10.1515/ijb-2020-0083
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Penalized partial least squares for pleiotropy.

    Broc, Camilo / Truong, Therese / Liquet, Benoit

    BMC bioinformatics

    2021  Volume 22, Issue 1, Page(s) 86

    Abstract: Background: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic ... ...

    Abstract Background: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level.
    Results: Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers.
    Conclusion: The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.
    MeSH term(s) Genome-Wide Association Study ; Least-Squares Analysis ; Phenotype ; Polymorphism, Single Nucleotide
    Language English
    Publishing date 2021-02-24
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-021-03968-1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Mapping of Coral Reefs with Multispectral Satellites: A Review of Recent Papers

    Nguyen, Teo / Liquet, Benoît / Mengersen, Kerrie / Sous, Damien

    Remote Sensing. 2021 Nov. 07, v. 13, no. 21

    2021  

    Abstract: Coral reefs are an essential source of marine biodiversity, but they are declining at an alarming rate under the combined effects of global change and human pressure. A precise mapping of coral reef habitat with high spatial and time resolutions has ... ...

    Abstract Coral reefs are an essential source of marine biodiversity, but they are declining at an alarming rate under the combined effects of global change and human pressure. A precise mapping of coral reef habitat with high spatial and time resolutions has become a necessary step for monitoring their health and evolution. This mapping can be achieved remotely thanks to satellite imagery coupled with machine-learning algorithms. In this paper, we review the different satellites used in recent literature, as well as the most common and efficient machine-learning methods. To account for the recent explosion of published research on coral reel mapping, we especially focus on the papers published between 2018 and 2020. Our review study indicates that object-based methods provide more accurate results than pixel-based ones, and that the most accurate methods are Support Vector Machine and Random Forest. We emphasize that the satellites with the highest spatial resolution provide the best images for benthic habitat mapping. We also highlight that preprocessing steps (water column correction, sunglint removal, etc.) and additional inputs (bathymetry data, aerial photographs, etc.) can significantly improve the mapping accuracy.
    Keywords benthic ecosystems ; biodiversity ; coral reefs ; corals ; evolution ; global change ; humans ; remote sensing ; support vector machines
    Language English
    Dates of publication 2021-1107
    Publishing place Multidisciplinary Digital Publishing Institute
    Document type Article
    ZDB-ID 2513863-7
    ISSN 2072-4292
    ISSN 2072-4292
    DOI 10.3390/rs13214470
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  9. Book ; Online: A Spectral Library and Method for Sparse Unmixing of Hyperspectral Images in Fluorescence Guided Resection of Brain Tumors

    Black, David / Liquet, Benoit / Kaneko, Sadahiro / Di leva, Antonio / Stummer, Walter / Molina, Eric Suero

    2024  

    Abstract: Through spectral unmixing, hyperspectral imaging (HSI) in fluorescence-guided brain tumor surgery has enabled detection and classification of tumor regions invisible to the human eye. Prior unmixing work has focused on determining a minimal set of viable ...

    Abstract Through spectral unmixing, hyperspectral imaging (HSI) in fluorescence-guided brain tumor surgery has enabled detection and classification of tumor regions invisible to the human eye. Prior unmixing work has focused on determining a minimal set of viable fluorophore spectra known to be present in the brain and effectively reconstructing human data without overfitting. With these endmembers, non-negative least squares regression (NNLS) was used to compute the abundances. However, HSI images are heterogeneous, so one small set of endmember spectra may not fit all pixels well. Additionally, NNLS is the maximum likelihood estimator only if the measurement is normally distributed, and it does not enforce sparsity, which leads to overfitting and unphysical results. Here, we analyzed 555666 HSI fluorescence spectra from 891 ex vivo measurements of patients with brain tumors to show that a Poisson distribution models the measured data 82% better than a Gaussian in terms of the Kullback-Leibler divergence and that the endmember abundance vectors are sparse. With this knowledge, we introduce (1) a library of 9 endmember spectra, (2) a sparse, non-negative Poisson regression algorithm to perform physics-informed unmixing with this library without overfitting, and (3) a highly realistic spectral measurement simulation with known endmember abundances. The new unmixing method was then tested on the human and simulated data and compared to four other candidate methods. It outperforms previous methods with 25% lower error in the computed abundances on the simulated data than NNLS, lower reconstruction error on human data, beUer sparsity, and 31 times faster runtime than state-of-the-art Poisson regression. This method and library of endmember spectra can enable more accurate spectral unmixing to beUer aid the surgeon during brain tumor resection.

    Comment: 17 pages, 4 tables, 6 figures; Under review
    Keywords Electrical Engineering and Systems Science - Image and Video Processing ; Quantitative Biology - Quantitative Methods
    Subject code 571
    Publishing date 2024-01-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article: Multi-Site and Multi-Year Remote Records of Operative Temperatures with Biomimetic Loggers Reveal Spatio-Temporal Variability in Mountain Lizard Activity and Persistence Proxy Estimates

    Hugon, Florèn / Liquet, Benoit / D’Amico, Frank

    Remote Sensing. 2020 Sept. 08, v. 12, no. 18

    2020  

    Abstract: Commonly, when studies deal with the effects of climate change on biodiversity, mean value is used more than other parameters. However, climate change also leads to greater temperature variability, and many papers have demonstrated its importance in the ... ...

    Abstract Commonly, when studies deal with the effects of climate change on biodiversity, mean value is used more than other parameters. However, climate change also leads to greater temperature variability, and many papers have demonstrated its importance in the implementation of biodiversity response strategies. We studied the spatio-temporal variability of activity time and persistence index, calculated from operative temperatures measured at three sites over three years, for a mountain endemic species. Temperatures were recorded with biomimetic loggers, an original remote sensing technology, which has the same advantages as these tools but is suitable for recording biological organisms data. Among the 42 tests conducted, 71% were significant for spatial variability and 28% for temporal variability. The differences in daily activity times and in persistence indices demonstrated the effects of the micro-habitat, habitat, slope, altitude, hydrography, and year. These observations have highlighted the great variability existence in the environmental temperatures experienced by lizard populations. Thus, our study underlines the importance to implement multi-year and multi-site studies to quantify the variability and produce more representative results. These studies can be facilitated by the use of biomimetic loggers, for which a user guide is provided in the last part of this paper.
    Keywords altitude ; ambient temperature ; biodiversity ; biomimetics ; climate change ; hydrology ; indigenous species ; lizards ; microhabitats ; paper ; population ; remote sensing ; spatial variation ; temporal variation ; testing ; variability
    Language English
    Dates of publication 2020-0908
    Publishing place Multidisciplinary Digital Publishing Institute
    Document type Article
    Note NAL-light
    ZDB-ID 2513863-7
    ISSN 2072-4292
    ISSN 2072-4292
    DOI 10.3390/rs12182908
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

To top