LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 26

Search options

  1. Article ; Online: Supervised Relevance-Redundancy assessments for feature selection in omics-based classification scenarios.

    Cascianelli, Silvia / Galzerano, Arianna / Masseroli, Marco

    Journal of biomedical informatics

    2023  Volume 144, Page(s) 104457

    Abstract: Background and objective: Many classification tasks in translational bioinformatics and genomics are characterized by the high dimensionality of potential features and unbalanced sample distribution among classes. This can affect classifier robustness ... ...

    Abstract Background and objective: Many classification tasks in translational bioinformatics and genomics are characterized by the high dimensionality of potential features and unbalanced sample distribution among classes. This can affect classifier robustness and increase the risk of overfitting, curse of dimensionality and generalization leaks; furthermore and most importantly, this can prevent obtaining adequate patient stratification required for precision medicine in facing complex diseases, like cancer. Setting up a feature selection strategy able to extract only proper predictive features by removing irrelevant, redundant, and noisy ones is crucial to achieving valuable results on the desired task.
    Methods: We propose a new feature selection approach, called ReRa, based on supervised Relevance-Redundancy assessments. ReRa consists of a customized step of relevance-based filtering, to identify a reduced subset of meaningful features, followed by a supervised similarity-based procedure to minimize redundancy. This latter step innovatively uses a combination of global and class-specific similarity assessments to remove redundant features while preserving those differentiated across classes, even when these classes are strongly unbalanced.
    Results: We compared ReRa with several existing feature selection methods to obtain feature spaces on which performing breast cancer patient subtyping using several classifiers: we considered two use cases based on gene or transcript isoform expression. In the vast majority of the assessed scenarios, when using ReRa-selected feature spaces, the performances were significantly increased compared to simple feature filtering, LASSO regularization, or even MRmr - another Relevance-Redundancy method. The two use cases represent an insightful example of translational application, taking advantage of ReRa capabilities to investigate and enhance a clinically-relevant patient stratification task, which could be easily applied also to other cancer types and diseases.
    Conclusions: ReRa approach has the potential to improve the performance of machine learning models used in an unbalanced classification scenario. Compared to another Relevance-Redundancy approach like MRmr, ReRa does not require tuning the number of preserved features, ensures efficiency and scalability over huge initial dimensionalities and allows re-evaluation of all previously selected features at each iteration of the redundancy assessment, to ultimately preserve only the most relevant and class-differentiated features.
    MeSH term(s) Humans ; Female ; Algorithms ; Computational Biology/methods ; Genomics ; Proteomics ; Breast Neoplasms/diagnosis ; Breast Neoplasms/genetics
    Language English
    Publishing date 2023-07-23
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2057141-0
    ISSN 1532-0480 ; 1532-0464
    ISSN (online) 1532-0480
    ISSN 1532-0464
    DOI 10.1016/j.jbi.2023.104457
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Identification of transcription factor high accumulation DNA zones.

    Cascianelli, Silvia / Ceddia, Gaia / Marchesi, Alberto / Masseroli, Marco

    BMC bioinformatics

    2023  Volume 24, Issue 1, Page(s) 395

    Abstract: Background: Transcription factors (TF) play a crucial role in the regulation of gene transcription; alterations of their activity and binding to DNA areas are strongly involved in cancer and other disease onset and development. For proper biomedical ... ...

    Abstract Background: Transcription factors (TF) play a crucial role in the regulation of gene transcription; alterations of their activity and binding to DNA areas are strongly involved in cancer and other disease onset and development. For proper biomedical investigation, it is hence essential to correctly trace TF dense DNA areas, having multiple bindings of distinct factors, and select DNA high occupancy target (HOT) zones, showing the highest accumulation of such bindings. Indeed, systematic and replicable analysis of HOT zones in a large variety of cells and tissues would allow further understanding of their characteristics and could clarify their functional role.
    Results: Here, we propose, thoroughly explain and discuss a full computational procedure to study in-depth DNA dense areas of transcription factor accumulation and identify HOT zones. This methodology, developed as a computationally efficient parametric algorithm implemented in an R/Bioconductor package, uses a systematic approach with two alternative methods to examine transcription factor bindings and provide comparative and fully-reproducible assessments. It offers different resolutions by introducing three distinct types of accumulation, which can analyze DNA from single-base to region-oriented levels, and a moving window, which can estimate the influence of the neighborhood for each DNA base under exam.
    Conclusions: We quantitatively assessed the full procedure by using our implemented software package, named TFHAZ, in two example applications of biological interest, proving its full reliability and relevance.
    MeSH term(s) Transcription Factors/metabolism ; Reproducibility of Results ; Gene Expression Regulation ; DNA/genetics ; Protein Binding ; Binding Sites/genetics
    Chemical Substances Transcription Factors ; DNA (9007-49-2)
    Language English
    Publishing date 2023-10-20
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-023-05528-1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Scenarios for the Integration of Microarray Gene Expression Profiles in COVID-19-Related Studies.

    Bernasconi, Anna / Cascianelli, Silvia

    Methods in molecular biology (Clifton, N.J.)

    2021  Volume 2401, Page(s) 195–215

    Abstract: The COVID-19 pandemic has hit heavily many aspects of our lives. At this time, genomic research is concerned with exploiting available datasets and knowledge to fuel discovery on this novel disease. Studies that can precisely characterize the gene ... ...

    Abstract The COVID-19 pandemic has hit heavily many aspects of our lives. At this time, genomic research is concerned with exploiting available datasets and knowledge to fuel discovery on this novel disease. Studies that can precisely characterize the gene expression profiles of human hosts infected by SARS-CoV-2 are of significant relevance. However, not many such experiments have yet been produced to date, nor made publicly available online. Thus, it is of paramount importance that data analysts explore all possibilities to integrate information coming from similar viruses and related diseases; interestingly, microarray gene profile experiments become extremely valuable for this purpose. This chapter reviews the aspects that should be considered when integrating transcriptomics data, considering mainly samples infected by different viruses and combining together various data types and also the extracted knowledge. It describes a series of scenarios from studies performed in literature and it suggests possible other directions of noteworthy integration.
    MeSH term(s) COVID-19/genetics ; Gene Expression Profiling ; Genomics ; Humans ; Pandemics ; Transcriptome
    Language English
    Publishing date 2021-12-13
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1940-6029
    ISSN (online) 1940-6029
    DOI 10.1007/978-1-0716-1839-4_13
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.

    Pallotta, Simone / Cascianelli, Silvia / Masseroli, Marco

    BMC bioinformatics

    2022  Volume 23, Issue 1, Page(s) 123

    Abstract: Background: Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary ... ...

    Abstract Background: Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis. Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures.
    Results: We propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions.
    Conclusions: RGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework. Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages. They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.
    MeSH term(s) Big Data ; Cloud Computing ; Genomics ; Metadata ; Software
    Language English
    Publishing date 2022-04-07
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-022-04648-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: HWD

    Pippi, Vittorio / Quattrini, Fabio / Cascianelli, Silvia / Cucchiara, Rita

    A Novel Evaluation Score for Styled Handwritten Text Generation

    2023  

    Abstract: Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep ... ...

    Abstract Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area.

    Comment: Accepted at BMVC2023
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Digital Libraries
    Subject code 004
    Publishing date 2023-10-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

    Pippi, Vittorio / Cascianelli, Silvia / Kermorvant, Christopher / Cucchiara, Rita

    2023  

    Abstract: Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same ... ...

    Abstract Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.

    Comment: Accepted at ICDAR2023
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Digital Libraries
    Subject code 006
    Publishing date 2023-05-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Evaluating Synthetic Pre-Training for Handwriting Processing Tasks

    Pippi, Vittorio / Cascianelli, Silvia / Baraldi, Lorenzo / Cucchiara, Rita

    2023  

    Abstract: In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting analysis tasks. To this end, we build a large synthetic dataset of word images rendered in several handwriting ... ...

    Abstract In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting analysis tasks. To this end, we build a large synthetic dataset of word images rendered in several handwriting fonts, which offers a complete supervision signal. We use it to train a simple convolutional neural network (ConvNet) with a fully supervised objective. The vector representations of the images obtained from the pre-trained ConvNet can then be considered as encodings of the handwriting style. We exploit such representations for Writer Retrieval, Writer Identification, Writer Verification, and Writer Classification and demonstrate that our pre-training strategy allows extracting rich representations of the writers' style that enable the aforementioned tasks with competitive results with respect to task-specific State-of-the-Art approaches.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-04-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Handwritten Text Generation from Visual Archetypes

    Pippi, Vittorio / Cascianelli, Silvia / Cucchiara, Rita

    2023  

    Abstract: Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While ...

    Abstract Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer's style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters.

    Comment: Accepted at CVPR2023
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 410
    Publishing date 2023-03-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Embodied Agents for Efficient Exploration and Smart Scene Description

    Bigazzi, Roberto / Cornia, Marcella / Cascianelli, Silvia / Baraldi, Lorenzo / Cucchiara, Rita

    2023  

    Abstract: The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this ... ...

    Abstract The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images generated through agent-environment interaction. Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions. Further, such descriptions offer user-understandable insights into the robot's representation of the environment by highlighting the prominent objects and the correlation between them as encountered during the exploration. To quantitatively assess the performance of the proposed approach, we also devise a specific score that takes into account both exploration and description skills. The experiments carried out on both photorealistic simulated environments and real-world ones demonstrate that our approach can effectively describe the robot's point of view during exploration, improving the human-friendly interpretability of its observations.

    Comment: Accepted by IEEE International Conference on Robotics and Automation (ICRA 2023)
    Keywords Computer Science - Robotics ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 629
    Publishing date 2023-01-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

    Cascianelli, Silvia / Cornia, Marcella / Baraldi, Lorenzo / Cucchiara, Rita

    2022  

    Abstract: Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when ... ...

    Abstract Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2022-08-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top