LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 67

Search options

  1. Book ; Online: The Relational Data Borg is Learning

    Olteanu, Dan

    2020  

    Abstract: This paper overviews an approach that addresses machine learning over relational data as a database problem. This is justified by two observations. First, the input to the learning task is commonly the result of a feature extraction query over the ... ...

    Abstract This paper overviews an approach that addresses machine learning over relational data as a database problem. This is justified by two observations. First, the input to the learning task is commonly the result of a feature extraction query over the relational data. Second, the learning task requires the computation of group-by aggregates. This approach has been already investigated for a number of supervised and unsupervised learning tasks, including: ridge linear regression, factorisation machines, support vector machines, decision trees, principal component analysis, and k-means; and also for linear algebra over data matrices. The main message of this work is that the runtime performance of machine learning can be dramatically boosted by a toolbox of techniques that exploit the knowledge of the underlying data. This includes theoretical development on the algebraic, combinatorial, and statistical structure of relational data processing and systems development on code specialisation, low-level computation sharing, and parallelisation. These techniques aim at lowering both the complexity and the constant factors of the learning time. This work is the outcome of extensive collaboration of the author with colleagues from RelationalAI, in particular Mahmoud Abo Khamis, Molham Aref, Hung Ngo, and XuanLong Nguyen, and from the FDB research project, in particular Ahmet Kara, Milos Nikolic, Maximilian Schleich, Amir Shaikhha, Jakub Zavodny, and Haozhe Zhang. The author would also like to thank the members of the FDB project for the figures and examples used in this paper. The author is grateful for support from industry: Amazon Web Services, Google, Infor, LogicBlox, Microsoft Azure, RelationalAI; and from the funding agencies EPSRC and ERC. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 682588.

    Comment: 14 pages, 11 figures, VLDB 2020 keynote
    Keywords Computer Science - Databases ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2020-08-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: From Shapley Value to Model Counting and Back

    Kara, Ahmet / Olteanu, Dan / Suciu, Dan

    2023  

    Abstract: In this paper we investigate the problem of quantifying the contribution of each variable to the satisfying assignments of a Boolean function based on the Shapley value. Our main result is a polynomial-time equivalence between computing Shapley values ... ...

    Abstract In this paper we investigate the problem of quantifying the contribution of each variable to the satisfying assignments of a Boolean function based on the Shapley value. Our main result is a polynomial-time equivalence between computing Shapley values and model counting for any class of Boolean functions that are closed under substitutions of variables with disjunctions of fresh variables. This result settles an open problem raised in prior work, which sought to connect the Shapley value computation to probabilistic query evaluation. We show two applications of our result. First, the Shapley values can be computed in polynomial time over deterministic and decomposable circuits, since they are closed under OR-substitutions. Second, there is a polynomial-time equivalence between computing the Shapley value for the tuples contributing to the answer of a Boolean conjunctive query and counting the models in the lineage of the query. This equivalence allows us to immediately recover the dichotomy for Shapley value computation in case of self-join-free Boolean conjunctive queries; in particular, the hardness for non-hierarchical queries can now be shown using a simple reduction from the #P-hard problem of model counting for lineage in positive bipartite disjunctive normal form.

    Comment: 22 pages
    Keywords Computer Science - Databases ; Computer Science - Computational Complexity ; Computer Science - Logic in Computer Science ; F.4.1 ; F.2 ; H.2
    Subject code 511
    Publishing date 2023-06-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article: Clostridium Difficile Infection in Rectal Cancer Patients after Diverted Loop Ileostomy Closure.

    Ghioldiş, Andrei Cristian / Sârbu, Vasile / Pundiche, Mihaela / Dan, Cristina / Butelchin, Cristina / Olteanu, Cornelia / Popescu, Răzvan Cătălin

    Chirurgia (Bucharest, Romania : 1990)

    2024  Volume 119, Issue 1, Page(s) 36–43

    Abstract: Aim: Clostridium difficile infection is a cause of increased morbidity and mortality in hospitals, particularly in patients with cancer pathology. There are several factors favouring the development of Clostridium difficile infection among cancer ... ...

    Abstract Aim: Clostridium difficile infection is a cause of increased morbidity and mortality in hospitals, particularly in patients with cancer pathology. There are several factors favouring the development of Clostridium difficile infection among cancer patients, including age, exposure to antibiotic and proton pump inhibitors therapy, and chemotherapy. This study was conducted to observe the prevalence of Clostridium difficile infection after the reversal of ileostomy loop for rectal cancer surgery, which were initially operated either open or laparoscopic.
    Method: A retrospective study was performed on patients who were operated in a single surgical team for rectal cancer who benefited of a diverted loop ileostomy over a 4-year period.
    MeSH term(s) Humans ; Ileostomy/adverse effects ; Retrospective Studies ; Clostridioides difficile ; Proton Pump Inhibitors ; Treatment Outcome ; Clostridium Infections/epidemiology ; Clostridium Infections/etiology ; Rectal Neoplasms/surgery ; Rectal Neoplasms/complications ; Anti-Bacterial Agents/therapeutic use
    Chemical Substances Proton Pump Inhibitors ; Anti-Bacterial Agents
    Language English
    Publishing date 2024-03-12
    Publishing country Romania
    Document type Journal Article
    ZDB-ID 419244-8
    ISSN 1842-368X ; 1221-9118 ; 0009-4730 ; 0377-5003
    ISSN (online) 1842-368X
    ISSN 1221-9118 ; 0009-4730 ; 0377-5003
    DOI 10.21614/chirurgia.2024.v.119.i.1.p.36
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Insert-Only versus Insert-Delete in Dynamic Query Evaluation

    Khamis, Mahmoud Abo / Kara, Ahmet / Olteanu, Dan / Suciu, Dan

    2023  

    Abstract: We study the dynamic query evaluation problem: Given a join query Q and a stream of updates, we would like to construct a data structure that supports constant-delay enumeration of the query output after each update. We show that a stream of N insert- ... ...

    Abstract We study the dynamic query evaluation problem: Given a join query Q and a stream of updates, we would like to construct a data structure that supports constant-delay enumeration of the query output after each update. We show that a stream of N insert-only updates (to an initially empty database) can be executed in total time O(N^{w(Q)}), where w(Q) is the fractional hypertree width of Q. This matches the complexity of the static query evaluation problem for Q and a database of size N. One corollary is that the average time per single-tuple insert is constant for acyclic joins. In contrast, we show that a stream of N insert-and-delete updates to Q can be executed in total time O(N^{w(Q')}), where Q' is obtained from Q by extending every relational atom with extra variables that represent the "lifespans" of tuples in Q. We show that this reduction is optimal in the sense that the static evaluation runtime of Q' provides a lower bound on the total update time of Q. Our approach recovers the optimal single-tuple update time for known queries such as the hierarchical and Loomis-Whitney join queries.
    Keywords Computer Science - Databases
    Subject code 005
    Publishing date 2023-12-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Join Size Bounds using Lp-Norms on Degree Sequences

    Khamis, Mahmoud Abo / Nakos, Vasileios / Olteanu, Dan / Suciu, Dan

    2023  

    Abstract: Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads ...

    Abstract Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads to significant system performance penalty. Recently, upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet they their main benefit is limited to cyclic queries, because they degenerate to rather trivial formulas on acyclic queries. We introduce a significant extension of the upper bounds, by incorporating $\ell_p$-norms of the degree sequences of join attributes. Our bounds are significantly lower than previously known bounds, even when applied to acyclic queries. These bounds are also based on information theory, they come with a matching query evaluation algorithm, are computable in exponential time in the query size, and are provably tight when all degrees are "simple".
    Keywords Computer Science - Databases ; Computer Science - Information Theory
    Subject code 005
    Publishing date 2023-06-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: F-IVM

    Kara, Ahmet / Nikolic, Milos / Olteanu, Dan / Zhang, Haozhe

    Analytics over Relational Databases under Updates

    2023  

    Abstract: This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models using the ...

    Abstract This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models using the covariance matrix of the input features; building Chow-Liu trees using pairwise mutual information of the input features; and matrix chain multiplication. F-IVM has three main ingredients: higher-order incremental view maintenance; factorized computation; and ring abstraction. F-IVM reduces the maintenance of a task to that of a hierarchy of simple views. Such views are functions mapping keys, which are tuples of input values, to payloads, which are elements from a ring. F-IVM also supports efficient factorized computation over keys, payloads, and updates. Finally, F-IVM treats uniformly seemingly disparate tasks. In the key space, all tasks require joins and variable marginalization. In the payload space, tasks differ in the definition of the sum and product ring operations. We implemented F-IVM on top of DBToaster and show that it can outperform classical first-order and fully recursive higher-order incremental view maintenance by orders of magnitude while using less memory.
    Keywords Computer Science - Databases
    Subject code 005
    Publishing date 2023-03-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: The dataset for the chronology of the sedimentation in the Danube abyssal fan which records the major episodes of the late-Holocene Black Sea evolution.

    Ilie, Maria / Sava, Tiberiu / Cristea, Gabriela / Ion, Gabriel / Olteanu, Dan / Mănăilescu, Cristian / Sava, Gabriela

    Data in brief

    2022  Volume 43, Page(s) 108444

    Abstract: Anoxic marine sediments at the confluence with large rivers are key archives for monitoring the anthropogenic impact in the environment and asses the carbon sink character of oxygen deprived waters. This data article describes the analysis methodology ... ...

    Abstract Anoxic marine sediments at the confluence with large rivers are key archives for monitoring the anthropogenic impact in the environment and asses the carbon sink character of oxygen deprived waters. This data article describes the analysis methodology and the results of the deep-sea sediments sampled from the NW part of the Black Sea, using the
    Language English
    Publishing date 2022-07-07
    Publishing country Netherlands
    Document type Journal Article
    ZDB-ID 2786545-9
    ISSN 2352-3409 ; 2352-3409
    ISSN (online) 2352-3409
    ISSN 2352-3409
    DOI 10.1016/j.dib.2022.108444
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Book ; Online: CHORUS

    Kayali, Moe / Lykov, Anton / Fountalis, Ilias / Vasiloglou, Nikolaos / Olteanu, Dan / Suciu, Dan

    Foundation Models for Unified Data Discovery and Exploration

    2023  

    Abstract: We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly ... ...

    Abstract We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models, impact of non-determinism on the outputs and syntactic/semantic signals. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models.
    Keywords Computer Science - Databases ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2023-06-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Givens Rotations for QR Decomposition, SVD and PCA over Database Joins

    Olteanu, Dan / Vortmeier, Nils / Živanović, Đorđe

    2022  

    Abstract: This article introduces Figaro, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. Figaro's main novelty is that it pushes the QR decomposition past the join. ... ...

    Abstract This article introduces Figaro, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. Figaro's main novelty is that it pushes the QR decomposition past the join. This leads to several desirable properties. For acyclic joins, it takes time linear in the database size and independent of the join size. Its execution is equivalent to the application of a sequence of Givens rotations proportional to the join size. Its number of rounding errors relative to the classical QR decomposition algorithms is on par with the database size relative to the join output size. The QR decomposition lies at the core of many linear algebra computations including the singular value decomposition (SVD) and the principal component analysis (PCA). We show how Figaro can be used to compute the orthogonal matrix in the QR decomposition, the SVD and the PCA of the join output without the need to materialize the join output. A suite of experiments validate that Figaro can outperform both in runtime performance and numerical accuracy the LAPACK library Intel MKL by a factor proportional to the gap between the sizes of the join output and input.
    Keywords Computer Science - Databases
    Publishing date 2022-04-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Conjunctive Queries with Free Access Patterns under Updates

    Kara, Ahmet / Nikolic, Milos / Olteanu, Dan / Zhang, Haozhe

    2022  

    Abstract: We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple ...

    Abstract We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach for such queries. We also give a syntactic characterisation of those queries that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given an input tuple. Finally, we chart the complexity trade-off between the preprocessing time, update time and enumeration delay for such queries. For a class of queries, our approach achieves optimal, albeit non-constant, update time and delay. Their optimality is predicated on the Online Matrix-Vector Multiplication conjecture. Our results recover prior work on the dynamic evaluation of conjunctive queries without access patterns.

    Comment: Extended and polished version. Title changed. Section 4 on the evaluation of arbitrary conjunctive queries with free access patterns is new
    Keywords Computer Science - Databases ; H.2.4
    Subject code 005
    Publishing date 2022-06-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top