LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 8 of total 8

Search options

  1. Book ; Online: On marginal feature attributions of tree-based models

    Filom, Khashayar / Miroshnikov, Alexey / Kotsiopoulos, Konstandinos / Kannan, Arjun Ravi

    2023  

    Abstract: Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal ( ... ...

    Abstract Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent only on the input-output function of the model. We contrast this with the popular TreeSHAP algorithm by presenting two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features, whereas the marginal Shapley values coincide. Furthermore, we discuss how the internal structure of tree-based models may be leveraged to help with computing their marginal feature attributions according to a linear game value. One important observation is that these are simple (piecewise-constant) functions with respect to a certain grid partition of the input space determined by the trained model. Another crucial observation, showcased by experiments with XGBoost, LightGBM and CatBoost libraries, is that only a portion of all features appears in a tree from the ensemble. Thus, the complexity of computing marginal Shapley (or Owen or Banzhaf) feature attributions may be reduced. This remains valid for a broader class of game values which we shall axiomatically characterize. A prime example is the case of CatBoost models where the trees are oblivious (symmetric) and the number of features in each of them is no larger than the depth. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models. This results in a fast, accurate algorithm for estimating these feature attributions.

    Comment: Major revision. Notation is simplified, technical details are moved to appendix, Algorithm 3.12 is rewritten, the complexity ...
    Keywords Computer Science - Machine Learning ; Computer Science - Computer Science and Game Theory
    Subject code 519
    Publishing date 2023-02-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Computing the joint distribution of the total tree length across loci in populations with variable size.

    Miroshnikov, Alexey / Steinrücken, Matthias

    Theoretical population biology

    2017  Volume 118, Page(s) 1–19

    Abstract: In recent years, a number of methods have been developed to infer complex demographic histories, especially historical population size changes, from genomic sequence data. Coalescent Hidden Markov Models have proven to be particularly useful for this ... ...

    Abstract In recent years, a number of methods have been developed to infer complex demographic histories, especially historical population size changes, from genomic sequence data. Coalescent Hidden Markov Models have proven to be particularly useful for this type of inference. Due to the Markovian structure of these models, an essential building block is the joint distribution of local genealogical trees, or statistics of these genealogies, at two neighboring loci in populations of variable size. Here, we present a novel method to compute the marginal and the joint distribution of the total length of the genealogical trees at two loci separated by at most one recombination event for samples of arbitrary size. To our knowledge, no method to compute these distributions has been presented in the literature to date. We show that they can be obtained from the solution of certain hyperbolic systems of partial differential equations. We present a numerical algorithm, based on the method of characteristics, that can be used to efficiently and accurately solve these systems and compute the marginal and the joint distributions. We demonstrate its utility to study the properties of the joint distribution. Our flexible method can be straightforwardly extended to handle an arbitrary fixed number of recombination events, to include the distributions of other statistics of the genealogies as well, and can also be applied in structured populations.
    MeSH term(s) Humans ; Markov Chains ; Pedigree ; Population Density ; Recombination, Genetic
    Language English
    Publishing date 2017-09-21
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 3948-2
    ISSN 1096-0325 ; 0040-5809
    ISSN (online) 1096-0325
    ISSN 0040-5809
    DOI 10.1016/j.tpb.2017.09.002
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: Mutual information-based group explainers with coalition structure for machine learning model explanations

    Miroshnikov, Alexey / Kotsiopoulos, Konstandinos / Kannan, Arjun Ravi

    2021  

    Abstract: In this article, we propose and investigate ML group explainers in a general game-theoretic setting with the focus on coalitional game values and games based on the conditional and marginal expectation of an ML model. The conditional game takes into ... ...

    Abstract In this article, we propose and investigate ML group explainers in a general game-theoretic setting with the focus on coalitional game values and games based on the conditional and marginal expectation of an ML model. The conditional game takes into account the joint distribution of the predictors, while the marginal game depends on the structure of the model. The objective of the article is to unify the two points of view under predictor dependencies and to reduce the complexity of group explanations. To achieve this, we propose a feature grouping technique that employs an information-theoretic measure of dependence and design appropriate groups explainers. Furthermore, in the context of coalitional game values with a two-step formulation, we introduce a theoretical scheme that generates recursive coalitional game values under a partition tree structure and investigate the properties of the corresponding group explainers.

    Comment: 51 pages, 69 figures
    Keywords Computer Science - Computer Science and Game Theory ; Mathematics - Probability
    Publishing date 2021-02-22
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: Radioecological and geochemical peculiarities of cryoconite on Novaya Zemlya glaciers.

    Miroshnikov, Alexey / Flint, Mikhail / Asadulin, Enver / Aliev, Ramiz / Shiryaev, Andrei / Kudikov, Arsenii / Khvostikov, Vladimir

    Scientific reports

    2021  Volume 11, Issue 1, Page(s) 23103

    Abstract: In recent years, cryoconite has received growing attention from a radioecological point of view, since several studies have shown that this material is extremely efficient in accumulating natural and anthropogenic radionuclides. The Novaya Zemlya ... ...

    Abstract In recent years, cryoconite has received growing attention from a radioecological point of view, since several studies have shown that this material is extremely efficient in accumulating natural and anthropogenic radionuclides. The Novaya Zemlya Archipelago (Russian Arctic) hosts the second largest glacial system in the Arctic. From 1957 to 1962, numerous atmospheric nuclear explosions were conducted at Novaya Zemlya, but to date, very little is known about the radioecology of its ice cap. Analysis of radionuclides and other chemical elements in cryoconite holes on Nalli Glacier reveals the presence of two main zones at different altitudes that present different radiological features. The first zone is 130-210 m above sea level (a.s.l.), has low radioactivity, high concentrations of lithophile elements and a chalcophile content close to that of upper continental crust clarkes. The second zone (220-370 m a.s.l.) is characterized by high activity levels of radionuclides and "inversion" of geochemical behaviour with lower concentrations of lithophiles and higher chalcophiles. In the upper part of this zone (350-370 m a.s.l.),
    Language English
    Publishing date 2021-11-29
    Publishing country England
    Document type Journal Article
    ZDB-ID 2615211-3
    ISSN 2045-2322 ; 2045-2322
    ISSN (online) 2045-2322
    ISSN 2045-2322
    DOI 10.1038/s41598-021-02601-8
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: parallelMCMCcombine: an R package for bayesian methods for big data and analytics.

    Miroshnikov, Alexey / Conlon, Erin M

    PloS one

    2014  Volume 9, Issue 9, Page(s) e108425

    Abstract: Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets ...

    Abstract Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.
    MeSH term(s) Bayes Theorem ; Data Interpretation, Statistical ; Markov Chains ; Models, Statistical ; Monte Carlo Method ; Software
    Language English
    Publishing date 2014-09-26
    Publishing country United States
    Document type Journal Article
    ISSN 1932-6203
    ISSN (online) 1932-6203
    DOI 10.1371/journal.pone.0108425
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Book ; Online: Wasserstein-based fairness interpretability framework for machine learning models

    Miroshnikov, Alexey / Kotsiopoulos, Konstandinos / Franks, Ryan / Kannan, Arjun Ravi

    2020  

    Abstract: The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population ...

    Abstract The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.

    Comment: 39 pages. (submitted for publication)
    Keywords Computer Science - Machine Learning ; Mathematics - Probability ; 49Q22 ; 91A12 ; 68T01 ; 90C08
    Subject code 310
    Publishing date 2020-11-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Cellulose Biodegradation Models; An Example of Cooperative Interactions in Structured Populations

    Jabin, Pierre-Emmanuel / Miroshnikov, Alexey / Young, Robin

    2014  

    Abstract: We introduce various models for cellulose bio-degradation by micro-organisms. Those models rely on complex chemical mechanisms, involve the structure of the cellulose chains and are allowed to depend on the phenotypical traits of the population of micro- ... ...

    Abstract We introduce various models for cellulose bio-degradation by micro-organisms. Those models rely on complex chemical mechanisms, involve the structure of the cellulose chains and are allowed to depend on the phenotypical traits of the population of micro-organisms. We then use the corresponding models in the context of multiple-trait populations. This leads to classical, logistic type, reproduction rates limiting the growth of large populations but also, and more surprisingly, limiting the growth of populations which are too small in a manner similar to the effects seen in populations requiring cooperative interactions (or sexual reproduction). This study hence offers a striking example of how some mechanisms resembling cooperation can occur in structured biological populations, even in the absence of any actual cooperation.

    Comment: 37 pages, accepted to ESAIM: Mathematical Modelling and Numerical Analysis (2017)
    Keywords Mathematics - Dynamical Systems ; 92B
    Subject code 612
    Publishing date 2014-11-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Motile Geobacter dechlorinators migrate into a model source zone of trichloroethene dense non-aqueous phase liquid: experimental evaluation and modeling.

    Philips, Jo / Miroshnikov, Alexey / Haest, Pieter Jan / Springael, Dirk / Smolders, Erik

    Journal of contaminant hydrology

    2014  Volume 170, Page(s) 28–38

    Abstract: Microbial migration towards a trichloroethene (TCE) dense non-aqueous phase liquid (DNAPL) could facilitate the bioaugmentation of TCE DNAPL source zones. This study characterized the motility of the Geobacter dechlorinators in a TCE to cis- ... ...

    Abstract Microbial migration towards a trichloroethene (TCE) dense non-aqueous phase liquid (DNAPL) could facilitate the bioaugmentation of TCE DNAPL source zones. This study characterized the motility of the Geobacter dechlorinators in a TCE to cis-dichloroethene dechlorinating KB-1(™) subculture. No chemotaxis towards or away from TCE was found using an agarose in-plug bridge method. A second experiment placed an inoculated aqueous layer on top of a sterile sand layer and showed that Geobacter migrated several centimeters in the sand layer in just 7days. A random motility coefficient for Geobacter in water of 0.24±0.02cm(2)·day(-1) was fitted. A third experiment used a diffusion-cell setup with a 5.5cm central sand layer separating a DNAPL from an aqueous top layer as a model source zone to examine the effect of random motility on TCE DNAPL dissolution. With top layer inoculation, Geobacter quickly colonized the sand layer, thereby enhancing the initial TCE DNAPL dissolution flux. After 19days, the DNAPL dissolution enhancement was only 24% lower than with an homogenous inoculation of the sand layer. A diffusion-motility model was developed to describe dechlorination and migration in the diffusion-cells. This model suggested that the fast colonization of the sand layer by Geobacter was due to the combination of random motility and growth on TCE.
    MeSH term(s) Biodegradation, Environmental ; Chemotaxis ; Diffusion ; Geobacter/physiology ; Halogenation ; Models, Theoretical ; Trichloroethylene/metabolism ; Water Pollutants, Chemical/metabolism
    Chemical Substances Water Pollutants, Chemical ; Trichloroethylene (290YE8AR51)
    Language English
    Publishing date 2014-12-01
    Publishing country Netherlands
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1494766-3
    ISSN 1873-6009 ; 0169-7722
    ISSN (online) 1873-6009
    ISSN 0169-7722
    DOI 10.1016/j.jconhyd.2014.09.010
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top