LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 35

Search options

  1. Article ; Online: Unsupervised AI reveals insect species-specific genome signatures.

    Sawada, Yui / Minei, Ryuhei / Tabata, Hiromasa / Ikemura, Toshimichi / Wada, Kennosuke / Wada, Yoshiko / Nagata, Hiroshi / Iwasaki, Yuki

    PeerJ

    2024  Volume 12, Page(s) e17025

    Abstract: Insects are a highly diverse phylogeny and possess a wide variety of traits, including the presence or absence of wings and metamorphosis. These diverse traits are of great interest for studying genome evolution, and numerous comparative genomic studies ... ...

    Abstract Insects are a highly diverse phylogeny and possess a wide variety of traits, including the presence or absence of wings and metamorphosis. These diverse traits are of great interest for studying genome evolution, and numerous comparative genomic studies have examined a wide phylogenetic range of insects. Here, we analyzed 22 insects belonging to a wide phylogenetic range (Endopterygota, Paraneoptera, Polyneoptera, Palaeoptera, and other insects) by using a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions in their genomic fragments (100-kb or 1-Mb sequences), which is an unsupervised machine learning algorithm that can extract species-specific characteristics of the oligonucleotide compositions (genome signatures). The genome signature is of particular interest in terms of the mechanisms and biological significance that have caused the species-specific difference, and can be used as a powerful search needle to explore the various roles of genome sequences other than protein coding, and can be used to unveil mysteries hidden in the genome sequence. Since BLSOM is an unsupervised clustering method, the clustering of sequences was performed based on the oligonucleotide composition alone, without providing information about the species from which each fragment sequence was derived. Therefore, not only the interspecies separation, but also the intraspecies separation can be achieved. Here, we have revealed the specific genomic regions with oligonucleotide compositions distinct from the usual sequences of each insect genome,
    MeSH term(s) Animals ; Humans ; Phylogeny ; Genome, Human ; Genome, Insect/genetics ; Oligonucleotides/genetics ; Artificial Intelligence
    Chemical Substances Oligonucleotides
    Language English
    Publishing date 2024-03-06
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2703241-3
    ISSN 2167-8359 ; 2167-8359
    ISSN (online) 2167-8359
    ISSN 2167-8359
    DOI 10.7717/peerj.17025
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: AI-based search for convergently expanding, advantageous mutations in SARS-CoV-2 by focusing on oligonucleotide frequencies.

    Toshimichi Ikemura / Yuki Iwasaki / Kennosuke Wada / Yoshiko Wada / Takashi Abe

    PLoS ONE, Vol 17, Iss 8, p e

    2022  Volume 0273860

    Abstract: Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population ... ...

    Abstract Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population are candidates for advantageous mutations, but neutral mutations hitchhiking with advantageous mutations are also likely to be included. To distinguish these, we focus on mutations that appear to occur independently in different lineages and expand in frequency in a convergent evolutionary manner. Batch-learning SOM (BLSOM) can separate SARS-CoV-2 genome sequences according by lineage from only providing the oligonucleotide composition. Focusing on remarkably expanding 20-mers, each of which is only represented by one copy in the viral genome, allows us to correlate the expanding 20-mers to mutations. Using visualization functions in BLSOM, we can efficiently identify mutations that have expanded remarkably both in the Omicron lineage, which is phylogenetically distinct from other lineages, and in other lineages. Most of these mutations involved changes in amino acids, but there were a few that did not, such as an intergenic mutation.
    Keywords Medicine ; R ; Science ; Q
    Subject code 570
    Language English
    Publishing date 2022-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes.

    Iwasaki, Yuki / Abe, Takashi / Wada, Kennosuke / Wada, Yoshiko / Ikemura, Toshimichi

    BMC microbiology

    2022  Volume 22, Issue 1, Page(s) 73

    Abstract: Background: Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public ... ...

    Abstract Background: Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes.
    Results: While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. Additionally, BLSOM also provided information concerning the special genomic region possibly undergoing RNA modifications.
    Conclusions: The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes, and it can complement phylogenetic methods based on sequence alignment.
    MeSH term(s) Artificial Intelligence ; COVID-19 ; Evolution, Molecular ; Humans ; Phylogeny ; SARS-CoV-2/genetics
    Language English
    Publishing date 2022-03-10
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2041505-9
    ISSN 1471-2180 ; 1471-2180
    ISSN (online) 1471-2180
    ISSN 1471-2180
    DOI 10.1186/s12866-022-02484-3
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: AI-based search for convergently expanding, advantageous mutations in SARS-CoV-2 by focusing on oligonucleotide frequencies.

    Ikemura, Toshimichi / Iwasaki, Yuki / Wada, Kennosuke / Wada, Yoshiko / Abe, Takashi

    PloS one

    2022  Volume 17, Issue 8, Page(s) e0273860

    Abstract: Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population ... ...

    Abstract Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population are candidates for advantageous mutations, but neutral mutations hitchhiking with advantageous mutations are also likely to be included. To distinguish these, we focus on mutations that appear to occur independently in different lineages and expand in frequency in a convergent evolutionary manner. Batch-learning SOM (BLSOM) can separate SARS-CoV-2 genome sequences according by lineage from only providing the oligonucleotide composition. Focusing on remarkably expanding 20-mers, each of which is only represented by one copy in the viral genome, allows us to correlate the expanding 20-mers to mutations. Using visualization functions in BLSOM, we can efficiently identify mutations that have expanded remarkably both in the Omicron lineage, which is phylogenetically distinct from other lineages, and in other lineages. Most of these mutations involved changes in amino acids, but there were a few that did not, such as an intergenic mutation.
    MeSH term(s) Artificial Intelligence ; COVID-19/genetics ; Genome, Viral ; Humans ; Machine Learning ; Mutation ; Oligonucleotides/genetics ; Phylogeny ; SARS-CoV-2/genetics ; Spike Glycoprotein, Coronavirus/genetics
    Chemical Substances Oligonucleotides ; Spike Glycoprotein, Coronavirus ; spike protein, SARS-CoV-2
    Language English
    Publishing date 2022-08-31
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2267670-3
    ISSN 1932-6203 ; 1932-6203
    ISSN (online) 1932-6203
    ISSN 1932-6203
    DOI 10.1371/journal.pone.0273860
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Comparative genomic analysis of the human genome and six bat genomes using unsupervised machine learning: Mb-level CpG and TFBS islands.

    Iwasaki, Yuki / Ikemura, Toshimichi / Wada, Kennosuke / Wada, Yoshiko / Abe, Takashi

    BMC genomics

    2022  Volume 23, Issue 1, Page(s) 497

    Abstract: Background: Emerging infectious disease-causing RNA viruses, such as the SARS-CoV-2 and Ebola viruses, are thought to rely on bats as natural reservoir hosts. Since these zoonotic viruses pose a great threat to humans, it is important to characterize ... ...

    Abstract Background: Emerging infectious disease-causing RNA viruses, such as the SARS-CoV-2 and Ebola viruses, are thought to rely on bats as natural reservoir hosts. Since these zoonotic viruses pose a great threat to humans, it is important to characterize the bat genome from multiple perspectives. Unsupervised machine learning methods for extracting novel information from big sequence data without prior knowledge or particular models are highly desirable for obtaining unexpected insights. We previously established a batch-learning self-organizing map (BLSOM) of the oligonucleotide composition that reveals novel genome characteristics from big sequence data.
    Results: In this study, using the oligonucleotide BLSOM, we conducted a comparative genomic study of humans and six bat species. BLSOM is an explainable-type machine learning algorithm that reveals the diagnostic oligonucleotides contributing to sequence clustering (self-organization). When unsupervised machine learning reveals unexpected and/or characteristic features, these features can be studied in more detail via the much simpler and more direct standard distribution map method. Based on this combined strategy, we identified the Mb-level enrichment of CG dinucleotide (Mb-level CpG islands) around the termini of bat long-scaffold sequences. In addition, a class of CG-containing oligonucleotides were enriched in the centromeric and pericentromeric regions of human chromosomes. Oligonucleotides longer than tetranucleotides often represent binding motifs for a wide variety of proteins (e.g., transcription factor binding sequences (TFBSs)). By analyzing the penta- and hexanucleotide composition, we observed the evident enrichment of a wide range of hexanucleotide TFBSs in centromeric and pericentromeric heterochromatin regions on all human chromosomes.
    Conclusion: Function of transcription factors (TFs) beyond their known regulation of gene expression (e.g., TF-mediated looping interactions between two different genomic regions) has received wide attention. The Mb-level TFBS and CpG islands are thought to be involved in the large-scale nuclear organization, such as centromere and telomere clustering. TFBSs, which are enriched in centromeric and pericentromeric heterochromatin regions, are thought to play an important role in the formation of nuclear 3D structures. Our machine learning-based analysis will help us to understand the differential features of nuclear 3D structures in the human and bat genomes.
    MeSH term(s) Animals ; COVID-19/transmission ; Chiroptera/genetics ; Chiroptera/virology ; CpG Islands ; Genome, Human/genetics ; Genomics/methods ; Heterochromatin/chemistry ; Heterochromatin/genetics ; Humans ; Molecular Conformation ; Oligonucleotides/chemistry ; SARS-CoV-2/physiology ; Unsupervised Machine Learning
    Chemical Substances Heterochromatin ; Oligonucleotides
    Language English
    Publishing date 2022-07-08
    Publishing country England
    Document type Comparative Study ; Journal Article
    ZDB-ID 2041499-7
    ISSN 1471-2164 ; 1471-2164
    ISSN (online) 1471-2164
    ISSN 1471-2164
    DOI 10.1186/s12864-022-08664-9
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Time-series analyses of directional sequence changes in SARS-CoV-2 genomes and an efficient search method for candidates for advantageous mutations for growth in human cells.

    Wada, Kennosuke / Wada, Yoshiko / Ikemura, Toshimichi

    Gene

    2020  Volume 763S, Page(s) 100038

    Abstract: We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should ... ...

    Abstract We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should reflect viral adaptations for efficient growth in human cells. We next developed a sequence alignment free method that extensively searches for advantageous mutations and rank them in an increase level for their intrapopulation frequency. Time-series analysis of occurrences of oligonucleotides of diverse lengths for SARS-CoV-2 genomes revealed seven distinctive mutations that rapidly expanded their intrapopulation frequency and are thought to be candidates of advantageous mutations for the efficient growth in human cells.
    Language English
    Publishing date 2020-08-06
    Publishing country Netherlands
    Document type Journal Article
    ZDB-ID 391792-7
    ISSN 1879-0038 ; 0378-1119
    ISSN (online) 1879-0038
    ISSN 0378-1119
    DOI 10.1016/j.gene.2020.100038
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Time-series analyses of directional sequence changes in SARS-CoV-2 genomes and an efficient search method for candidates for advantageous mutations for growth in human cells.

    Wada, Kennosuke / Wada, Yoshiko / Ikemura, Toshimichi

    Gene: X

    2020  Volume 5, Page(s) 100038

    Abstract: We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should ... ...

    Abstract We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should reflect viral adaptations for efficient growth in human cells. We next developed a sequence alignment free method that extensively searches for advantageous mutations and rank them in an increase level for their intrapopulation frequency. Time-series analysis of occurrences of oligonucleotides of diverse lengths for SARS-CoV-2 genomes revealed seven distinctive mutations that rapidly expanded their intrapopulation frequency and are thought to be candidates of advantageous mutations for the efficient growth in human cells.
    Keywords covid19
    Language English
    Publishing date 2020-08-06
    Publishing country Netherlands
    Document type Journal Article
    ISSN 2590-1583
    ISSN (online) 2590-1583
    DOI 10.1016/j.gene.2020.100038
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Time-series analyses of directional sequence changes in SARS-CoV-2 genomes and an efficient search method for candidates for advantageous mutations for growth in human cells

    Kennosuke Wada / Yoshiko Wada / Toshimichi Ikemura

    Gene: X, Vol 5, Iss , Pp 100038- (2020)

    2020  

    Abstract: We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should ... ...

    Abstract We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should reflect viral adaptations for efficient growth in human cells. We next developed a sequence alignment free method that extensively searches for advantageous mutations and rank them in an increase level for their intrapopulation frequency. Time-series analysis of occurrences of oligonucleotides of diverse lengths for SARS-CoV-2 genomes revealed seven distinctive mutations that rapidly expanded their intrapopulation frequency and are thought to be candidates of advantageous mutations for the efficient growth in human cells.
    Keywords COVID-19 ; PCR primer ; Therapeutic oligonucleotide ; Zoonotic virus ; Ebolavirus ; Oligonucleotides ; Genetics ; QH426-470 ; covid19
    Language English
    Publishing date 2020-12-01T00:00:00Z
    Publisher Elsevier
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Mb-level CpG and TFBS islands visualized by AI and their roles in the nuclear organization of the human genome.

    Wada, Kennosuke / Wada, Yoshiko / Ikemura, Toshimichi

    Genes & genetic systems

    2020  Volume 95, Issue 1, Page(s) 29–41

    Abstract: Unsupervised machine learning that can discover novel knowledge from big sequence data without prior knowledge or particular models is highly desirable for current genome study. We previously established a batch-learning self-organizing map (BLSOM) for ... ...

    Abstract Unsupervised machine learning that can discover novel knowledge from big sequence data without prior knowledge or particular models is highly desirable for current genome study. We previously established a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions, which can reveal various novel genome characteristics from big sequence data, and found that transcription factor binding sequences (TFBSs) and CpG-containing oligonucleotides are enriched in human centromeric and pericentromeric regions, which support centromere clustering and form the condensed heterochromatin "chromocenter" in interphase nuclei. The number and size of chromocenters, as well as the type of centromeres gathered in individual chromocenters, vary depending on cell type. To study molecular mechanisms of cell type-dependent chromocenter formation, we analyzed distribution patterns of occurrence per Mb of hexa- and heptanucleotide TFBSs, which have been compiled by the SwissRegulon Portal, and of CpG-containing oligonucleotides. We found Mb-level islands enriched for TFBSs and CpG-containing oligonucleotides in centromeric and pericentromeric regions on all human chromosomes except chrY. Considering molecular mechanisms for cell type-dependent centromere clustering, the chromosome-dependent enrichment of a set of TFBSs and CpG-containing oligonucleotides is of particular interest, since the cellular content of TFs and methyl-CpG-binding proteins exhibits cell type-dependent regulation. A newly introduced BLSOM, which analyzed occurrences of a total of 3,946 octanucleotide TFBSs compiled by the SwissRegulon Portal, has self-organized (separated) the sequences that are characteristically enriched in TFBSs and shown that these sequences are derived primarily from centromeric and pericentromeric constitutive heterochromatin regions. Furthermore, the BLSOM identified and visualized characteristic TFBSs that are enriched in these regions. By analyzing Hi-C data for interchromosomal interactions, the present study showed that the chromatin segments supporting the interchromosomal interactions locate primarily in Mb-level TFBS and CpG islands and are thus enriched for a wide variety of TFBSs and CG-containing oligonucleotides.
    MeSH term(s) Artificial Intelligence ; Binding Sites ; Centromere/genetics ; Chromosomes, Human/genetics ; CpG Islands/genetics ; Genome, Human/genetics ; Heterochromatin/genetics ; Humans ; Oligonucleotides/genetics ; Protein Binding ; Transcription Factors/genetics ; Transcription Factors/metabolism
    Chemical Substances Heterochromatin ; Oligonucleotides ; Transcription Factors
    Language English
    Publishing date 2020-03-12
    Publishing country Japan
    Document type Journal Article
    ZDB-ID 1323536-9
    ISSN 1880-5779 ; 1341-7568
    ISSN (online) 1880-5779
    ISSN 1341-7568
    DOI 10.1266/ggs.19-00027
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome.

    Ikemura, Toshimichi / Iwasaki, Yuki / Wada, Kennosuke / Wada, Yoshiko / Abe, Takashi

    Genes & genetic systems

    2021  Volume 96, Issue 4, Page(s) 165–176

    Abstract: In genetics and related fields, huge amounts of data, such as genome sequences, are accumulating, and the use of artificial intelligence (AI) suitable for big data analysis has become increasingly important. Unsupervised AI that can reveal novel ... ...

    Abstract In genetics and related fields, huge amounts of data, such as genome sequences, are accumulating, and the use of artificial intelligence (AI) suitable for big data analysis has become increasingly important. Unsupervised AI that can reveal novel knowledge from big data without prior knowledge or particular models is highly desirable for analyses of genome sequences, particularly for obtaining unexpected insights. We have developed a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions that can reveal various novel genome characteristics. Here, we explain the data mining by the BLSOM: an unsupervised AI. As a specific target, we first selected SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) because a large number of viral genome sequences have been accumulated via worldwide efforts. We analyzed more than 0.6 million sequences collected primarily in the first year of the pandemic. BLSOMs for short oligonucleotides (e.g., 4-6-mers) allowed separation into known clades, but longer oligonucleotides further increased the separation ability and revealed subgrouping within known clades. In the case of 15-mers, there is mostly one copy in the genome; thus, 15-mers that appeared after the epidemic started could be connected to mutations, and the BLSOM for 15-mers revealed the mutations that contributed to separation into known clades and their subgroups. After introducing the detailed methodological strategies, we explain BLSOMs for various topics, such as the tetranucleotide BLSOM for over 5 million 5-kb fragment sequences derived from almost all microorganisms currently available and its use in metagenome studies. We also explain BLSOMs for various eukaryotes, including fishes, frogs and Drosophila species, and found a high separation ability among closely related species. When analyzing the human genome, we found enrichments in transcription factor-binding sequences in centromeric and pericentromeric heterochromatin regions. The tDNAs (tRNA genes) could be separated according to their corresponding amino acid.
    MeSH term(s) Artificial Intelligence ; Cluster Analysis ; Codon Usage ; Computational Biology/methods ; Genome, Human ; Genome, Viral ; Humans ; Metagenomics/methods ; Mutation ; RNA, Transfer ; SARS-CoV-2/genetics ; Time Factors
    Chemical Substances RNA, Transfer (9014-25-9)
    Language English
    Publishing date 2021-09-27
    Publishing country Japan
    Document type Journal Article ; Review
    ZDB-ID 1323536-9
    ISSN 1880-5779 ; 1341-7568
    ISSN (online) 1880-5779
    ISSN 1341-7568
    DOI 10.1266/ggs.21-00025
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top