LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 15

Search options

  1. Article ; Online: SMOTE-CD

    Teo Nguyen / Kerrie Mengersen / Damien Sous / Benoit Liquet

    PLoS ONE, Vol 18, Iss 6, p e

    SMOTE for compositional data.

    2023  Volume 0287705

    Abstract: Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing ... ...

    Abstract Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Oversampling TEchnique (SMOTE) to deal with compositional data imbalance. The new approach, called SMOTE for Compositional Data (SMOTE-CD), generates synthetic examples by computing a linear combination of selected existing data points, using compositional data operations. The performance of the SMOTE-CD is tested with three different regressors (Gradient Boosting tree, Neural Networks, Dirichlet regressor) applied to two real datasets and to synthetic generated data, and the performance is evaluated using accuracy, cross-entropy, F1-score, R2 score and RMSE. The results show improvements across all metrics, but the impact of oversampling on performance varies depending on the model and the data. In some cases, oversampling may lead to a decrease in performance for the majority class. However, for the real data, the best performance across all models is achieved when oversampling is used. Notably, the F1-score is consistently increased with oversampling. Unlike the original technique, the performance is not improved when combining oversampling of the minority classes and undersampling of the majority class. The Python package smote-cd implements the method and is available online.
    Keywords Medicine ; R ; Science ; Q
    Subject code 780
    Language English
    Publishing date 2023-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Leveraging pleiotropic association using sparse group variable selection in genomics data

    Matthew Sutton / Pierre-Emmanuel Sugier / Therese Truong / Benoit Liquet

    BMC Medical Research Methodology, Vol 22, Iss 1, Pp 1-

    2022  Volume 12

    Abstract: Abstract Background Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often ... ...

    Abstract Abstract Background Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. Methods We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods. Results Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. Conclusion We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.
    Keywords Genetic epidemiology ; High dimensional data ; Lasso penalization ; Oncology ; Pathway analysis ; Pleiotropy ; Medicine (General) ; R5-920
    Subject code 310
    Language English
    Publishing date 2022-01-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: CPMCGLM

    Benoit Liquet / Jérémie Riou

    BMC Medical Research Methodology, Vol 19, Iss 1, Pp 1-

    an R package for p-value adjustment when looking for an optimal transformation of a single explanatory variable in generalized linear models

    2019  Volume 8

    Abstract: Abstract Background In medical research, explanatory continuous variables are frequently transformed or converted into categorical variables. If the coding is unknown, many tests can be used to identify the “optimal” transformation. This common process, ... ...

    Abstract Abstract Background In medical research, explanatory continuous variables are frequently transformed or converted into categorical variables. If the coding is unknown, many tests can be used to identify the “optimal” transformation. This common process, involving the problems of multiple testing, requires a correction of the significance level. Liquet and Commenges proposed an asymptotic correction of significance level in the context of generalized linear models (GLM) (Liquet and Commenges, Stat Probab Lett 71:33–38, 2005). This procedure has been developed for dichotomous and Box-Cox transformations. Furthermore, Liquet and Riou suggested the use of resampling methods to estimate the significance level for transformations into categorical variables with more than two levels (Liquet and Riou, BMC Med Res Methodol 13:75, 2013). Results CPMCGLM provides to users both methods of p-value adjustment. Futhermore, they are available for a large set of transformations. This paper aims to provide insight the user an overview of the methodological context, and explain in detail the use of the CPMCGLM R package through its application to a real epidemiological dataset. Conclusion We present here the CPMCGLM R package providing efficient methods for the correction of type-I error rate in the context of generalized linear models. This is the first and the only available package in R providing such methods applied to this context. This package is designed to help researchers, who work principally in the field of biostatistics and epidemiology, to analyze their data in the context of optimal cutoff point determination.
    Keywords R package ; Generalized linear model ; Resampling ; p-value adjustment ; Multiple testing ; Union intersection test ; Medicine (General) ; R5-920
    Subject code 310
    Language English
    Publishing date 2019-04-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: Mapping of Coral Reefs with Multispectral Satellites

    Teo Nguyen / Benoît Liquet / Kerrie Mengersen / Damien Sous

    Remote Sensing, Vol 13, Iss 4470, p

    A Review of Recent Papers

    2021  Volume 4470

    Abstract: Coral reefs are an essential source of marine biodiversity, but they are declining at an alarming rate under the combined effects of global change and human pressure. A precise mapping of coral reef habitat with high spatial and time resolutions has ... ...

    Abstract Coral reefs are an essential source of marine biodiversity, but they are declining at an alarming rate under the combined effects of global change and human pressure. A precise mapping of coral reef habitat with high spatial and time resolutions has become a necessary step for monitoring their health and evolution. This mapping can be achieved remotely thanks to satellite imagery coupled with machine-learning algorithms. In this paper, we review the different satellites used in recent literature, as well as the most common and efficient machine-learning methods. To account for the recent explosion of published research on coral reel mapping, we especially focus on the papers published between 2018 and 2020. Our review study indicates that object-based methods provide more accurate results than pixel-based ones, and that the most accurate methods are Support Vector Machine and Random Forest. We emphasize that the satellites with the highest spatial resolution provide the best images for benthic habitat mapping. We also highlight that preprocessing steps (water column correction, sunglint removal, etc.) and additional inputs (bathymetry data, aerial photographs, etc.) can significantly improve the mapping accuracy.
    Keywords coral mapping ; coral reefs ; machine learning ; remote sensing ; satellite imagery ; Science ; Q
    Subject code 333
    Language English
    Publishing date 2021-11-01T00:00:00Z
    Publisher MDPI AG
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: Penalized partial least squares for pleiotropy

    Camilo Broc / Therese Truong / Benoit Liquet

    BMC Bioinformatics, Vol 22, Iss 1, Pp 1-

    2021  Volume 31

    Abstract: Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic ...

    Abstract Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.
    Keywords Genetic epidemiology ; High dimensional data ; Lasso Penalization ; Meta-analysis ; Oncology ; Partial Least Square ; Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 519
    Language English
    Publishing date 2021-02-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: Multi-Site and Multi-Year Remote Records of Operative Temperatures with Biomimetic Loggers Reveal Spatio-Temporal Variability in Mountain Lizard Activity and Persistence Proxy Estimates

    Florèn Hugon / Benoit Liquet / Frank D’Amico

    Remote Sensing, Vol 12, Iss 2908, p

    2020  Volume 2908

    Abstract: Commonly, when studies deal with the effects of climate change on biodiversity, mean value is used more than other parameters. However, climate change also leads to greater temperature variability, and many papers have demonstrated its importance in the ... ...

    Abstract Commonly, when studies deal with the effects of climate change on biodiversity, mean value is used more than other parameters. However, climate change also leads to greater temperature variability, and many papers have demonstrated its importance in the implementation of biodiversity response strategies. We studied the spatio-temporal variability of activity time and persistence index, calculated from operative temperatures measured at three sites over three years, for a mountain endemic species. Temperatures were recorded with biomimetic loggers, an original remote sensing technology, which has the same advantages as these tools but is suitable for recording biological organisms data. Among the 42 tests conducted, 71% were significant for spatial variability and 28% for temporal variability. The differences in daily activity times and in persistence indices demonstrated the effects of the micro-habitat, habitat, slope, altitude, hydrography, and year. These observations have highlighted the great variability existence in the environmental temperatures experienced by lizard populations. Thus, our study underlines the importance to implement multi-year and multi-site studies to quantify the variability and produce more representative results. These studies can be facilitated by the use of biomimetic loggers, for which a user guide is provided in the last part of this paper.
    Keywords activity time ; biomimetic model ; biomimetic logger ; data logger ; operative temperature ; persistence ; Science ; Q
    Subject code 333
    Language English
    Publishing date 2020-09-01T00:00:00Z
    Publisher MDPI AG
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Automatic Creation of Storm Impact Database Based on Video Monitoring and Convolutional Neural Networks

    Aurelien Callens / Denis Morichon / Pedro Liria / Irati Epelde / Benoit Liquet

    Remote Sensing, Vol 13, Iss 1933, p

    2021  Volume 1933

    Abstract: Data about storm impacts are essential for the disaster risk reduction process, but unlike data about storm characteristics, they are not routinely collected. In this paper, we demonstrate the high potential of convolutional neural networks to ... ...

    Abstract Data about storm impacts are essential for the disaster risk reduction process, but unlike data about storm characteristics, they are not routinely collected. In this paper, we demonstrate the high potential of convolutional neural networks to automatically constitute storm impact database using timestacks images provided by coastal video monitoring stations. Several convolutional neural network architectures and methods to deal with class imbalance were tested on two sites (Biarritz and Zarautz) to find the best practices for this classification task. This study shows that convolutional neural networks are well adapted for the classification of timestacks images into storm impact regimes. Overall, the most complex and deepest architectures yield better results. Indeed, the best performances are obtained with the VGG16 architecture for both sites with F-scores of 0.866 for Biarritz and 0.858 for Zarautz. For the class imbalance problem, the method of oversampling shows best classification accuracy with F-scores on average 30% higher than the ones obtained with cost sensitive learning. The transferability of the learning method between sites is also investigated and shows conclusive results. This study highlights the high potential of convolutional neural networks to enhance the value of coastal video monitoring data that are routinely recorded on many coastal sites. Furthermore, it shows that this type of deep neural network can significantly contribute to the setting up of risk databases necessary for the determination of storm risk indicators and, more broadly, for the optimization of risk-mitigation measures.
    Keywords convolutional neural networks ; storm impact database ; transfer learning ; video monitoring ; Science ; Q
    Subject code 006
    Language English
    Publishing date 2021-05-01T00:00:00Z
    Publisher MDPI AG
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Understanding links between water-quality variables and nitrate concentration in freshwater streams using high frequency sensor data.

    Claire Kermorvant / Benoit Liquet / Guy Litt / Kerrie Mengersen / Erin E Peterson / Rob J Hyndman / Jeremy B Jones / Catherine Leigh

    PLoS ONE, Vol 18, Iss 6, p e

    2023  Volume 0287640

    Abstract: Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of ... ...

    Abstract Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of water-quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water-quality variables. We analysed high-frequency water-quality data from in-situ sensors deployed in three sites from different watersheds and climate zones within the National Ecological Observatory Network, USA. We used generalised additive mixed models to explain the nonlinear relationships at each site between nitrate concentration and conductivity, turbidity, dissolved oxygen, water temperature, and elevation. Temporal auto-correlation was modelled with an auto-regressive-moving-average (ARIMA) model and we examined the relative importance of the explanatory variables. Total deviance explained by the models was high for all sites (99%). Although variable importance and the smooth regression parameters differed among sites, the models explaining the most variation in nitrate contained the same explanatory variables. This study demonstrates that building a model for nitrate using the same set of explanatory water-quality variables is achievable, even for sites with vastly different environmental and climatic characteristics. Applying such models will assist managers to select cost-effective water-quality variables to monitor when the goals are to gain a spatial and temporal in-depth understanding of nitrate dynamics and adapt management plans accordingly.
    Keywords Medicine ; R ; Science ; Q
    Subject code 550
    Language English
    Publishing date 2023-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Understanding links between water-quality variables and nitrate concentration in freshwater streams using high frequency sensor data

    Claire Kermorvant / Benoit Liquet / Guy Litt / Kerrie Mengersen / Erin E. Peterson / Rob J. Hyndman / Jeremy B. Jones / Catherine Leigh

    PLoS ONE, Vol 18, Iss

    2023  Volume 6

    Abstract: Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of ... ...

    Abstract Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of water-quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water-quality variables. We analysed high-frequency water-quality data from in-situ sensors deployed in three sites from different watersheds and climate zones within the National Ecological Observatory Network, USA. We used generalised additive mixed models to explain the nonlinear relationships at each site between nitrate concentration and conductivity, turbidity, dissolved oxygen, water temperature, and elevation. Temporal auto-correlation was modelled with an auto-regressive–moving-average (ARIMA) model and we examined the relative importance of the explanatory variables. Total deviance explained by the models was high for all sites (99%). Although variable importance and the smooth regression parameters differed among sites, the models explaining the most variation in nitrate contained the same explanatory variables. This study demonstrates that building a model for nitrate using the same set of explanatory water-quality variables is achievable, even for sites with vastly different environmental and climatic characteristics. Applying such models will assist managers to select cost-effective water-quality variables to monitor when the goals are to gain a spatial and temporal in-depth understanding of nitrate dynamics and adapt management plans accordingly.
    Keywords Medicine ; R ; Science ; Q
    Subject code 550
    Language English
    Publishing date 2023-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Multi-Index Ecoacoustics Analysis for Terrestrial Soundscapes

    Marina D. A. Scarpelli / Benoit Liquet / David Tucker / Susan Fuller / Paul Roe

    Frontiers in Ecology and Evolution, Vol

    A New Semi-Automated Approach Using Time-Series Motif Discovery and Random Forest Classification

    2021  Volume 9

    Abstract: High rates of biodiversity loss caused by human-induced changes in the environment require new methods for large scale fauna monitoring and data analysis. While ecoacoustic monitoring is increasingly being used and shows promise, analysis and ... ...

    Abstract High rates of biodiversity loss caused by human-induced changes in the environment require new methods for large scale fauna monitoring and data analysis. While ecoacoustic monitoring is increasingly being used and shows promise, analysis and interpretation of the big data produced remains a challenge. Computer-generated acoustic indices potentially provide a biologically meaningful summary of sound, however, temporal autocorrelation, difficulties in statistical analysis of multi-index data and lack of consistency or transferability in different terrestrial environments have hindered the application of those indices in different contexts. To address these issues we investigate the use of time-series motif discovery and random forest classification of multi-indices through two case studies. We use a semi-automated workflow combining time-series motif discovery and random forest classification of multi-index (acoustic complexity, temporal entropy, and events per second) data to categorize sounds in unfiltered recordings according to the main source of sound present (birds, insects, geophony). Our approach showed more than 70% accuracy in label assignment in both datasets. The categories assigned were broad, but we believe this is a great improvement on traditional single index analysis of environmental recordings as we can now give ecological meaning to recordings in a semi-automated way that does not require expert knowledge and manual validation is only necessary for a small subset of the data. Furthermore, temporal autocorrelation, which is largely ignored by researchers, has been effectively eliminated through the time-series motif discovery technique applied here for the first time to ecoacoustic data. We expect that our approach will greatly assist researchers in the future as it will allow large datasets to be rapidly processed and labeled, enabling the screening of recordings for undesired sounds, such as wind, or target biophony (insects and birds) for biodiversity monitoring or bioacoustics research.
    Keywords acoustic complexity index ; acoustic ecology ; acoustic indices ; ecoacoustics ; terrestrial soundscapes ; Evolution ; QH359-425 ; Ecology ; QH540-549.5
    Subject code 006
    Language English
    Publishing date 2021-12-01T00:00:00Z
    Publisher Frontiers Media S.A.
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top