LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 44

Search options

  1. Article ; Online: The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies

    R. Onur Öztornaci / Hamzah Syed / Andrew P. Morris / Bahar Taşdelen

    Journal of Big Data, Vol 10, Iss 1, Pp 1-

    2023  Volume 28

    Abstract: Abstract Machine learning (ML) methods for uncovering single nucleotide polymorphisms (SNPs) in genome-wide association study (GWAS) data that can be used to predict disease outcomes are becoming increasingly used in genetic research. Two issues with the ...

    Abstract Abstract Machine learning (ML) methods for uncovering single nucleotide polymorphisms (SNPs) in genome-wide association study (GWAS) data that can be used to predict disease outcomes are becoming increasingly used in genetic research. Two issues with the use of ML models are finding the correct method for dealing with imbalanced data and data training. This article compares three ML models to identify SNPs that predict type 2 diabetes (T2D) status using the Support vector machine SMOTE (SVM SMOTE), The Adaptive Synthetic Sampling Approach (ADASYN), Random under sampling (RUS) on GWAS data from elderly male participants (165 cases and 951 controls) from the Uppsala Longitudinal Study of Adult Men (ULSAM). It was also applied to SNPs selected by the SMOTE, SVM SMOTE, ADASYN, and RUS clumping method. The analysis was performed using three different ML models: (i) support vector machine (SVM), (ii) multilayer perceptron (MLP) and (iii) random forests (RF). The accuracy of the case–control classification was compared between these three methods. The best classification algorithm was a combination of MLP and SMOTE (97% accuracy). Both RF and SVM achieved good accuracy results of over 90%. Overall, methods used against unbalanced data, all three ML algorithms were found to improve prediction accuracy.
    Keywords Machine learning ; Class imbalanced methods ; GWAS ; ULSAM study ; Computer engineering. Computer hardware ; TK7885-7895 ; Information technology ; T58.5-58.64 ; Electronic computers. Computer science ; QA75.5-76.95
    Subject code 006
    Language English
    Publishing date 2023-11-01T00:00:00Z
    Publisher SpringerOpen
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: snpQT

    Benjamin Wingfield / Christina Vasilopoulou / William Duddy / Andrew P. Morris

    F1000Research, Vol

    flexible, reproducible, and comprehensive quality control and imputation of genomic data [version 2; peer review: 2 approved, 1 approved with reservations]

    2021  Volume 10

    Abstract: Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Software incompatibilities, and inconsistencies ... ...

    Abstract Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Software incompatibilities, and inconsistencies across computing environments, are recurrent challenges, leading to poor reproducibility. Existing semi-automated or automated solutions lack comprehensive quality checks, flexible workflow architecture, and user control. To address these challenges, we have developed snpQT: a scalable, stand-alone software pipeline using nextflow and BioContainers, for comprehensive, reproducible and interactive quality control of human genomic data. snpQT offers some 36 discrete quality filters or correction steps in a complete standardised pipeline, producing graphical reports to demonstrate the state of data before and after each quality control procedure. This includes human genome build conversion, population stratification against data from the 1,000 Genomes Project, automated population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used, and a synthetic dataset and comprehensive online tutorial are provided for testing, educational purposes, and demonstration. The snpQT pipeline is designed to run with minimal user input and coding experience; quality control steps are implemented with numerous user-modifiable thresholds, and workflows can be flexibly combined in custom combinations. snpQT is open source and freely available at https://github.com/nebfield/snpQT. A comprehensive online tutorial and installation guide is provided through to GWAS (https://snpqt.readthedocs.io/en/latest/), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset.
    Keywords GWAS ; Quality Control ; GWAS pipeline ; Nextflow ; Imputation ; SNPs ; eng ; Medicine ; R ; Science ; Q
    Language English
    Publishing date 2021-11-01T00:00:00Z
    Publisher F1000 Research Ltd
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Leveraging information between multiple population groups and traits improves fine-mapping resolution

    Feng Zhou / Opeyemi Soremekun / Tinashe Chikowore / Segun Fatumo / Inês Barroso / Andrew P. Morris / Jennifer L. Asimit

    Nature Communications, Vol 14, Iss 1, Pp 1-

    2023  Volume 12

    Abstract: Abstract Statistical fine-mapping helps to pinpoint likely causal variants underlying genetic association signals. Its resolution can be improved by (i) leveraging information between traits; and (ii) exploiting differences in linkage disequilibrium ... ...

    Abstract Abstract Statistical fine-mapping helps to pinpoint likely causal variants underlying genetic association signals. Its resolution can be improved by (i) leveraging information between traits; and (ii) exploiting differences in linkage disequilibrium structure between diverse population groups. Using association summary statistics, MGflashfm jointly fine-maps signals from multiple traits and population groups; MGfm uses an analogous framework to analyse each trait separately. We also provide a practical approach to fine-mapping with out-of-sample reference panels. In simulation studies we show that MGflashfm and MGfm are well-calibrated and that the mean proportion of causal variants with PP > 0.80 is above 0.75 (MGflashfm) and 0.70 (MGfm). In our analysis of four lipids traits across five population groups, MGflashfm gives a median 99% credible set reduction of 10.5% over MGfm. MGflashfm and MGfm only require summary level data, making them very useful fine-mapping tools in consortia efforts where individual-level data cannot be shared.
    Keywords Science ; Q
    Language English
    Publishing date 2023-11-01T00:00:00Z
    Publisher Nature Portfolio
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture

    Haiko Schurz / Vivek Naranbhai / Tom A Yates / James J Gilchrist / Tom Parks / Peter J Dodd / Marlo Möller / Eileen G Hoal / Andrew P Morris / Adrian VS Hill

    eLife, Vol

    2024  Volume 13

    Abstract: The heritability of susceptibility to tuberculosis (TB) disease has been well recognized. Over 100 genes have been studied as candidates for TB susceptibility, and several variants were identified by genome-wide association studies (GWAS), but few ... ...

    Abstract The heritability of susceptibility to tuberculosis (TB) disease has been well recognized. Over 100 genes have been studied as candidates for TB susceptibility, and several variants were identified by genome-wide association studies (GWAS), but few replicate. We established the International Tuberculosis Host Genetics Consortium to perform a multi-ancestry meta-analysis of GWAS, including 14,153 cases and 19,536 controls of African, Asian, and European ancestry. Our analyses demonstrate a substantial degree of heritability (pooled polygenic h2 = 26.3%, 95% CI 23.7–29.0%) for susceptibility to TB that is shared across ancestries, highlighting an important host genetic influence on disease. We identified one global host genetic correlate for TB at genome-wide significance (p<5 × 10-8) in the human leukocyte antigen (HLA)-II region (rs28383206, p-value=5.2 × 10-9) but failed to replicate variants previously associated with TB susceptibility. These data demonstrate the complex shared genetic architecture of susceptibility to TB and the importance of large-scale GWAS analysis across multiple ancestries experiencing different levels of infection pressure.
    Keywords tuberculosis ; GWAS ; multi-ancestry ; meta-analysis ; HLA ; Medicine ; R ; Science ; Q ; Biology (General) ; QH301-705.5
    Subject code 572
    Language English
    Publishing date 2024-01-01T00:00:00Z
    Publisher eLife Sciences Publications Ltd
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: SurvivalGWAS_SV

    Hamzah Syed / Andrea L. Jorgensen / Andrew P. Morris

    BMC Bioinformatics, Vol 18, Iss 1, Pp 1-

    software for the analysis of genome-wide association studies of imputed genotypes with “time-to-event” outcomes

    2017  Volume 6

    Abstract: Abstract Background Analysis of genome-wide association studies (GWAS) with “time to event” outcomes have become increasingly popular, predominantly in the context of pharmacogenetics, where the survival endpoint could be death, disease remission or the ... ...

    Abstract Abstract Background Analysis of genome-wide association studies (GWAS) with “time to event” outcomes have become increasingly popular, predominantly in the context of pharmacogenetics, where the survival endpoint could be death, disease remission or the occurrence of an adverse drug reaction. However, methodology and software that can efficiently handle the scale and complexity of genetic data from GWAS with time to event outcomes has not been extensively developed. Results SurvivalGWAS_SV is an easy to use software implemented using C# and run on Linux, Mac OS X & Windows operating systems. SurvivalGWAS_SV is able to handle large scale genome-wide data, allowing for imputed genotypes by modelling time to event outcomes under a dosage model. Either a Cox proportional hazards or Weibull regression model is used for analysis. The software can adjust for multiple covariates and incorporate SNP-covariate interaction effects. Conclusions We introduce a new console application analysis tool for the analysis of GWAS with time to event outcomes. SurvivalGWAS_SV is compatible with high performance parallel computing clusters, thereby allowing efficient and effective analysis of large scale GWAS datasets, without incurring memory issues. With its particular relevance to pharmacogenetic GWAS, SurvivalGWAS_SV will aid in the identification of genetic biomarkers of patient response to treatment, with the ultimate goal of personalising therapeutic intervention for an array of diseases.
    Keywords Genome-wide association study ; Pharmacogenetics ; Time to event ; Cox proportional hazards ; Weibull ; Survival analysis ; Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 004
    Language English
    Publishing date 2017-05-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis?

    Christina Vasilopoulou / Andrew P. Morris / George Giannakopoulos / Stephanie Duguez / William Duddy

    Journal of Personalized Medicine, Vol 10, Iss 247, p

    2020  Volume 247

    Abstract: Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning ...

    Abstract Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
    Keywords Amyotrophic Lateral Sclerosis ; machine learning ; genome-wide association studies ; GWAS ; genomics ; ALS pathology ; Medicine ; R
    Subject code 006
    Language English
    Publishing date 2020-11-01T00:00:00Z
    Publisher MDPI AG
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Exploiting horizontal pleiotropy to search for causal pathways within a Mendelian randomization framework

    Yoonsu Cho / Philip C. Haycock / Eleanor Sanderson / Tom R. Gaunt / Jie Zheng / Andrew P. Morris / George Davey Smith / Gibran Hemani

    Nature Communications, Vol 11, Iss 1, Pp 1-

    2020  Volume 13

    Abstract: In Mendelian randomization (MR) studies, one typically selects SNPs as instrumental variables that do not directly affect the outcome to avoid violation of MR assumptions. Here, Cho et al. present a framework, MR-TRYX, that leverages knowledge of such ... ...

    Abstract In Mendelian randomization (MR) studies, one typically selects SNPs as instrumental variables that do not directly affect the outcome to avoid violation of MR assumptions. Here, Cho et al. present a framework, MR-TRYX, that leverages knowledge of such outliers of horizontal pleiotropy to identify putative causal relationships between exposure and outcome.
    Keywords Science ; Q
    Language English
    Publishing date 2020-02-01T00:00:00Z
    Publisher Nature Publishing Group
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Exploiting horizontal pleiotropy to search for causal pathways within a Mendelian randomization framework

    Yoonsu Cho / Philip C. Haycock / Eleanor Sanderson / Tom R. Gaunt / Jie Zheng / Andrew P. Morris / George Davey Smith / Gibran Hemani

    Nature Communications, Vol 11, Iss 1, Pp 1-

    2020  Volume 13

    Abstract: In Mendelian randomization (MR) studies, one typically selects SNPs as instrumental variables that do not directly affect the outcome to avoid violation of MR assumptions. Here, Cho et al. present a framework, MR-TRYX, that leverages knowledge of such ... ...

    Abstract In Mendelian randomization (MR) studies, one typically selects SNPs as instrumental variables that do not directly affect the outcome to avoid violation of MR assumptions. Here, Cho et al. present a framework, MR-TRYX, that leverages knowledge of such outliers of horizontal pleiotropy to identify putative causal relationships between exposure and outcome.
    Keywords Science ; Q
    Language English
    Publishing date 2020-02-01T00:00:00Z
    Publisher Nature Portfolio
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Genome-Wide association between EYA1 and Aspirin-induced peptic ulceration

    Stephane Bourgeois / Daniel F. Carr / Crispin O. Musumba / Alexander Penrose / Celestine Esume / Andrew P. Morris / Andrea L. Jorgensen / J. Eunice Zhang / D. Mark Pritchard / Panos Deloukas / Munir Pirmohamed

    EBioMedicine, Vol 74, Iss , Pp 103728- (2021)

    2021  

    Abstract: ABSTRACT: Background: Low-dose aspirin can cause gastric and duodenal ulceration, hereafter called peptic ulcer disease (PUD). Predisposition is thought to be related to clinical and genetic factors; our aim was to identify genetic risk factors ... ...

    Abstract ABSTRACT: Background: Low-dose aspirin can cause gastric and duodenal ulceration, hereafter called peptic ulcer disease (PUD). Predisposition is thought to be related to clinical and genetic factors; our aim was to identify genetic risk factors associated with aspirin-induced PUD. Methods: Patients (n=1478) were recruited from 15 UK hospitals. Cases (n=505) were defined as patients with endoscopically confirmed PUD within 2 weeks of using aspirin and non-aspirin Non-Steroidal Anti-Inflammatory Drugs (NSAIDs). They were compared to two control groups: patients with endoscopically confirmed PUD without any history of NSAID use within 3 months of diagnosis (n=495), and patients with no PUD on endoscopy (n=478). A genome-wide association study (GWAS) of aspirin-induced cases (n=247) was compared to 476 controls. The results were validated by replication in another 84 cases and 162 controls. Findings: The GWAS identified one variant, rs12678747 (p=1·65×10−7) located in the last intron of EYA1 on chromosome 8. The association was replicated in another sample of 84 PUD patients receiving aspirin (p=0·002). Meta-analysis of discovery and replication cohort data for rs12678747, yielded a genome-wide significant association (p=3·12×10−11; OR=2·03; 95% CI 1·65-2·50). Expression of EYA1 was lower at the gastric ulcer edge when compared with the antrum. Interpretation: Genetic variation in an intron of the EYA1 gene increases the risk of endoscopically confirmed aspirin-induced PUD. Reduced EYA1 expression in the upper gastrointestinal epithelium may modulate risk, but the functional basis of this association will need mechanistic evaluation. Funding: Department of Health Chair in Pharmacogenetics, MRC Centre for Drug Safety Science and the Barts Cardiovascular NIHR Biomedical Research Centre, British Heart Foundation (BHF)
    Keywords NSAID ; ulcer ; Aspirin ; GWAS ; Medicine ; R ; Medicine (General) ; R5-920
    Subject code 616
    Language English
    Publishing date 2021-12-01T00:00:00Z
    Publisher Elsevier
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis

    Jing Yang / Amanda McGovern / Paul Martin / Kate Duffus / Xiangyu Ge / Peyman Zarrineh / Andrew P. Morris / Antony Adamson / Peter Fraser / Magnus Rattray / Stephen Eyre

    Nature Communications, Vol 11, Iss 1, Pp 1-

    2020  Volume 13

    Abstract: Although genome-wide association studies have identified genetic variation contributing to disease risk, assigning causal genes is challenging. Here, the authors generate ATAC-seq, Hi-C, Capture Hi-C and RNA-seq data in stimulated CD4+ T cells to ... ...

    Abstract Although genome-wide association studies have identified genetic variation contributing to disease risk, assigning causal genes is challenging. Here, the authors generate ATAC-seq, Hi-C, Capture Hi-C and RNA-seq data in stimulated CD4+ T cells to identify functional enhancers and demonstrate interactions of expression quantitative trait loci with target genes in rheumatoid arthritis.
    Keywords Science ; Q
    Language English
    Publishing date 2020-09-01T00:00:00Z
    Publisher Nature Publishing Group
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top