LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 27

Search options

  1. Article ; Online: Redundancy-aware unsupervised ranking based on game theory

    Chiara Balestra / Carlo Maj / Emmanuel Müller / Andreas Mayr

    PLoS ONE, Vol 18, Iss

    Ranking pathways in collections of gene sets

    2023  Volume 3

    Abstract: In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation of their biological meaning. In ... ...

    Abstract In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation of their biological meaning. In Data Mining, it is often argued that techniques to reduce the dimensionality of data could increase the maneuverability and consequently the interpretability of large data. In the past years, moreover, we witnessed an increasing consciousness of the importance of understanding data and interpretable models in the machine learning and bioinformatics communities. On the one hand, there exist techniques aiming to aggregate overlapping gene sets to create larger pathways. While these methods could partly solve the large size of the collections’ problem, modifying biological pathways is hardly justifiable in this biological context. On the other hand, the representation methods to increase interpretability of collections of gene sets that have been proposed so far have proved to be insufficient. Inspired by this Bioinformatics context, we propose a method to rank sets within a family of sets based on the distribution of the singletons and their size. We obtain sets’ importance scores by computing Shapley values; Making use of microarray games, we do not incur the typical exponential computational complexity. Moreover, we address the challenge of constructing redundancy-aware rankings where, in our case, redundancy is a quantity proportional to the size of intersections among the sets in the collections. We use the obtained rankings to reduce the dimension of the families, therefore showing lower redundancy among sets while still preserving a high coverage of their elements. We finally evaluate our approach for collections of gene sets and apply Gene Sets Enrichment Analysis techniques to the now smaller collections: As expected, the unsupervised nature of the proposed rankings allows for unremarkable differences in the number of significant gene sets for specific ...
    Keywords Medicine ; R ; Science ; Q
    Subject code 612
    Language English
    Publishing date 2023-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Redundancy-aware unsupervised ranking based on game theory

    Chiara Balestra / Carlo Maj / Emmanuel Müller / Andreas Mayr

    PLoS ONE, Vol 18, Iss 3, p e

    Ranking pathways in collections of gene sets.

    2023  Volume 0282699

    Abstract: In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation of their biological meaning. In ... ...

    Abstract In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation of their biological meaning. In Data Mining, it is often argued that techniques to reduce the dimensionality of data could increase the maneuverability and consequently the interpretability of large data. In the past years, moreover, we witnessed an increasing consciousness of the importance of understanding data and interpretable models in the machine learning and bioinformatics communities. On the one hand, there exist techniques aiming to aggregate overlapping gene sets to create larger pathways. While these methods could partly solve the large size of the collections' problem, modifying biological pathways is hardly justifiable in this biological context. On the other hand, the representation methods to increase interpretability of collections of gene sets that have been proposed so far have proved to be insufficient. Inspired by this Bioinformatics context, we propose a method to rank sets within a family of sets based on the distribution of the singletons and their size. We obtain sets' importance scores by computing Shapley values; Making use of microarray games, we do not incur the typical exponential computational complexity. Moreover, we address the challenge of constructing redundancy-aware rankings where, in our case, redundancy is a quantity proportional to the size of intersections among the sets in the collections. We use the obtained rankings to reduce the dimension of the families, therefore showing lower redundancy among sets while still preserving a high coverage of their elements. We finally evaluate our approach for collections of gene sets and apply Gene Sets Enrichment Analysis techniques to the now smaller collections: As expected, the unsupervised nature of the proposed rankings allows for unremarkable differences in the number of significant gene sets for specific ...
    Keywords Medicine ; R ; Science ; Q
    Subject code 612
    Language English
    Publishing date 2023-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction

    Christian Staerk / Andreas Mayr

    BMC Bioinformatics, Vol 22, Iss 1, Pp 1-

    2021  Volume 28

    Abstract: Abstract Background Statistical boosting is a computational approach to select and estimate interpretable prediction models for high-dimensional biomedical data, leading to implicit regularization and variable selection when combined with early stopping. ...

    Abstract Abstract Background Statistical boosting is a computational approach to select and estimate interpretable prediction models for high-dimensional biomedical data, leading to implicit regularization and variable selection when combined with early stopping. Traditionally, the set of base-learners is fixed for all iterations and consists of simple regression learners including only one predictor variable at a time. Furthermore, the number of iterations is typically tuned by optimizing the predictive performance, leading to models which often include unnecessarily large numbers of noise variables. Results We propose three consecutive extensions of classical component-wise gradient boosting. In the first extension, called Subspace Boosting (SubBoost), base-learners can consist of several variables, allowing for multivariable updates in a single iteration. To compensate for the larger flexibility, the ultimate selection of base-learners is based on information criteria leading to an automatic stopping of the algorithm. As the second extension, Random Subspace Boosting (RSubBoost) additionally includes a random preselection of base-learners in each iteration, enabling the scalability to high-dimensional data. In a third extension, called Adaptive Subspace Boosting (AdaSubBoost), an adaptive random preselection of base-learners is considered, focusing on base-learners which have proven to be predictive in previous iterations. Simulation results show that the multivariable updates in the three subspace algorithms are particularly beneficial in cases of high correlations among signal covariates. In several biomedical applications the proposed algorithms tend to yield sparser models than classical statistical boosting, while showing a very competitive predictive performance also compared to penalized regression approaches like the (relaxed) lasso and the elastic net. Conclusions The proposed randomized boosting approaches with multivariable base-learners are promising extensions of statistical boosting, particularly suited ...
    Keywords Boosting ; Feature selection ; High-dimensional data ; Information criteria ; Sparsity ; Variable selection ; Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 519
    Language English
    Publishing date 2021-09-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: Estimating the course of the COVID-19 pandemic in Germany via spline-based hierarchical modelling of death counts

    Tobias Wistuba / Andreas Mayr / Christian Staerk

    Scientific Reports, Vol 12, Iss 1, Pp 1-

    2022  Volume 9

    Abstract: Abstract We consider a retrospective modelling approach for estimating effective reproduction numbers based on death counts during the first year of the COVID-19 pandemic in Germany. The proposed Bayesian hierarchical model incorporates splines to ... ...

    Abstract Abstract We consider a retrospective modelling approach for estimating effective reproduction numbers based on death counts during the first year of the COVID-19 pandemic in Germany. The proposed Bayesian hierarchical model incorporates splines to estimate reproduction numbers flexibly over time while adjusting for varying effective infection fatality rates. The approach also provides estimates of dark figures regarding undetected infections. Results for Germany illustrate that our estimates based on death counts are often similar to classical estimates based on confirmed cases; however, considering death counts allows to disentangle effects of adapted testing policies from transmission dynamics. In particular, during the second wave of infections, classical estimates suggest a flattening infection curve following the “lockdown light” in November 2020, while our results indicate that infections continued to rise until the “second lockdown” in December 2020. This observation is associated with more stringent testing criteria introduced concurrently with the “lockdown light”, which is reflected in subsequently increasing dark figures of infections estimated by our model. In light of progressive vaccinations, shifting the focus from modelling confirmed cases to reported deaths with the possibility to incorporate effective infection fatality rates might be of increasing relevance for the future surveillance of the pandemic.
    Keywords Medicine ; R ; Science ; Q
    Subject code 310
    Language English
    Publishing date 2022-06-01T00:00:00Z
    Publisher Nature Portfolio
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: Estimating effective infection fatality rates during the course of the COVID-19 pandemic in Germany

    Christian Staerk / Tobias Wistuba / Andreas Mayr

    BMC Public Health, Vol 21, Iss 1, Pp 1-

    2021  Volume 9

    Abstract: Abstract Background The infection fatality rate (IFR) of the Coronavirus Disease 2019 (COVID-19) is one of the most discussed figures in the context of this pandemic. In contrast to the case fatality rate (CFR), the IFR depends on the total number of ... ...

    Abstract Abstract Background The infection fatality rate (IFR) of the Coronavirus Disease 2019 (COVID-19) is one of the most discussed figures in the context of this pandemic. In contrast to the case fatality rate (CFR), the IFR depends on the total number of infected individuals – not just on the number of confirmed cases. In order to estimate the IFR, several seroprevalence studies have been or are currently conducted. Methods Using German COVID-19 surveillance data and age-group specific IFR estimates from multiple international studies, this work investigates time-dependent variations in effective IFR over the course of the pandemic. Three different methods for estimating (effective) IFRs are presented: (a) population-averaged IFRs based on the assumption that the infection risk is independent of age and time, (b) effective IFRs based on the assumption that the age distribution of confirmed cases approximately reflects the age distribution of infected individuals, and (c) effective IFRs accounting for age- and time-dependent dark figures of infections. Results Effective IFRs in Germany are estimated to vary over time, as the age distributions of confirmed cases and estimated infections are changing during the course of the pandemic. In particular during the first and second waves of infections in spring and autumn/winter 2020, there has been a pronounced shift in the age distribution of confirmed cases towards older age groups, resulting in larger effective IFR estimates. The temporary increase in effective IFR during the first wave is estimated to be smaller but still remains when adjusting for age- and time-dependent dark figures. A comparison of effective IFRs with observed CFRs indicates that a substantial fraction of the time-dependent variability in observed mortality can be explained by changes in the age distribution of infections. Furthermore, a vanishing gap between effective IFRs and observed CFRs is apparent after the first infection wave, while an increasing gap can be observed during the second wave. ...
    Keywords COVID-19 ; SARS-CoV-2 ; Infection fatality rate ; Mortality ; Dark figures ; Public aspects of medicine ; RA1-1270
    Language English
    Publishing date 2021-06-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: The relative area score for sublingual varices reliability measurement

    Christian R. Klein / David Stoppenbrink / Jannik Geier / Andreas Mayr / Helmut Stark

    BMC Oral Health, Vol 23, Iss 1, Pp 1-

    a diagnostic study

    2023  Volume 8

    Abstract: Abstract Background Sublingual varices (SV) and their predictive potential for other clinical parameters is a much studied topic in oral medicine. SVs have been well studied as predictive markers for many common diseases such as arterial hypertension, ... ...

    Abstract Abstract Background Sublingual varices (SV) and their predictive potential for other clinical parameters is a much studied topic in oral medicine. SVs have been well studied as predictive markers for many common diseases such as arterial hypertension, cardiovascular disease, smoking, type 2 diabetes mellitus and age. Despite many prevalence studies, it is still unclear how the reliability of SV inspection affects its predictive power. The aim of this study was to quantify the inspection reliability of SV. Methods In a diagnostic study, the clinical inspection of 78 patients by 23 clinicians was examined for the diagnosis of SV. Digital images of the underside of the tongue were taken from each patient. The physicians were then asked to rate them for the presence of sublingual varices (0/1) in an online inspection experiment. Statistical analysis for inter-item and inter-rater reliability was performed in a τ-equivalent measurement model with Cronbach's $$\alpha$$ α and Fleiss κ. Results The interrater reliability for sublingual varices was relatively low with κ = 0.397. The internal consistency of image findings for SV was relatively high with α≈ 0.937. This shows that although SV inspection is possible in principle, it has a low reliability R. This means that the inspection finding (0/1) of individual images often cannot be reproduced stably. Therefore, SV inspection is a difficult task of clinical investigation. The reliability R of SV inspection also limits the maximum linear correlation $${r}_{max}$$ r max of SV with an arbitrary other parameter Y. The reliability of SV inspection R = 0.847 limits the maximum correlation to $${r}_{max}$$ r max (SV, Y) = 0,920—a 100% correlation was a priori not achievable in our sample. To overcome the problem of low reliability in SV inspection, we propose the RA (relative area) score as a continuous classification system for SV, which normalises the area of visible sublingual veins to the square of the length of the tongue, providing a dimensionless measure of SV. ...
    Keywords Sublingual varices ; Reliability ; Clinical inspection ; Aging ; Oral science ; Maximum correlation ; Dentistry ; RK1-715
    Subject code 600
    Language English
    Publishing date 2023-06-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Latent class distributional regression for the estimation of non-linear reference limits from contaminated data sources

    Tobias Hepp / Jakob Zierk / Manfred Rauh / Markus Metzler / Andreas Mayr

    BMC Bioinformatics, Vol 21, Iss 1, Pp 1-

    2020  Volume 15

    Abstract: Abstract Background Medical decision making based on quantitative test results depends on reliable reference intervals, which represent the range of physiological test results in a healthy population. Current methods for the estimation of reference ... ...

    Abstract Abstract Background Medical decision making based on quantitative test results depends on reliable reference intervals, which represent the range of physiological test results in a healthy population. Current methods for the estimation of reference limits focus either on modelling the age-dependent dynamics of different analytes directly in a prospective setting or the extraction of independent distributions from contaminated data sources, e.g. data with latent heterogeneity due to unlabeled pathologic cases. In this article, we propose a new method to estimate indirect reference limits with non-linear dependencies on covariates from contaminated datasets by combining the framework of mixture models and distributional regression. Results Simulation results based on mixtures of Gaussian and gamma distributions suggest accurate approximation of the true quantiles that improves with increasing sample size and decreasing overlap between the mixture components. Due to the high flexibility of the framework, initialization of the algorithm requires careful considerations regarding appropriate starting weights. Estimated quantiles from the extracted distribution of healthy hemoglobin concentration in boys and girls provide clinically useful pediatric reference limits similar to solutions obtained using different approaches which require more samples and are computationally more expensive. Conclusions Latent class distributional regression models represent the first method to estimate indirect non-linear reference limits from a single model fit, but the general scope of applications can be extended to other scenarios with latent heterogeneity.
    Keywords Latent class regression ; Finite mixture models ; Distributional regression ; Reference limits ; Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 310
    Language English
    Publishing date 2020-11-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Corrigendum to “Probing for Sparse and Fast Variable Selection with Model-Based Boosting”

    Janek Thomas / Tobias Hepp / Andreas Mayr / Bernd Bischl

    Computational and Mathematical Methods in Medicine, Vol

    2018  Volume 2018

    Keywords Computer applications to medicine. Medical informatics ; R858-859.7
    Language English
    Publishing date 2018-01-01T00:00:00Z
    Publisher Hindawi Limited
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Predictive Modelling Based on Statistical Learning in Biomedicine

    Olaf Gefeller / Benjamin Hofner / Andreas Mayr / Elisabeth Waldmann

    Computational and Mathematical Methods in Medicine, Vol

    2017  Volume 2017

    Keywords Computer applications to medicine. Medical informatics ; R858-859.7
    Language English
    Publishing date 2017-01-01T00:00:00Z
    Publisher Hindawi Publishing Corporation
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations.

    Andreas Mayr / Matthias Schmid

    PLoS ONE, Vol 9, Iss 1, p e

    2014  Volume 84483

    Abstract: The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their ... ...

    Abstract The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discriminatory power of a prediction rule. Specifically, we propose a gradient boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.
    Keywords Medicine ; R ; Science ; Q
    Subject code 006
    Language English
    Publishing date 2014-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top