LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 186

Search options

  1. Article ; Online: Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly.

    Darian, Joshua Casey / Kundu, Ritu / Rajaby, Ramesh / Sung, Wing-Kin

    Nature methods

    2024  Volume 21, Issue 4, Page(s) 574–583

    Abstract: Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between ... ...

    Abstract Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.
    MeSH term(s) Nanopores ; Diploidy ; Haploidy ; High-Throughput Nucleotide Sequencing/methods ; Telomere/genetics ; Sequence Analysis, DNA/methods
    Language English
    Publishing date 2024-03-08
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2169522-2
    ISSN 1548-7105 ; 1548-7091
    ISSN (online) 1548-7105
    ISSN 1548-7091
    DOI 10.1038/s41592-023-02141-1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: SVsearcher: A more accurate structural variation detection method in long read data.

    Zheng, Yan / Shang, Xuequn / Sung, Wing-Kin

    Computers in biology and medicine

    2023  Volume 158, Page(s) 106843

    Abstract: Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read ... ...

    Abstract Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50×) datasets and more than 25% for low coverage (10×) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.
    MeSH term(s) High-Throughput Nucleotide Sequencing/methods ; Genomics/methods ; Genome ; Sequence Analysis, DNA/methods
    Language English
    Publishing date 2023-03-31
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 127557-4
    ISSN 1879-0534 ; 0010-4825
    ISSN (online) 1879-0534
    ISSN 0010-4825
    DOI 10.1016/j.compbiomed.2023.106843
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Oner, Sung, and Lee: Researchers in digital pathology for the future of modern medicine.

    Oner, Mustafa Umit / Sung, Wing-Kin / Lee, Hwee Kuan

    Patterns (New York, N.Y.)

    2022  Volume 3, Issue 2, Page(s) 100447

    Abstract: Oner, an early-career researcher, and Lee and Sung, group leaders, have developed a deep learning model for accurate prediction of the proportion of cancer cells within tumor tissue. This is a necessary step for precision oncology and target therapy in ... ...

    Abstract Oner, an early-career researcher, and Lee and Sung, group leaders, have developed a deep learning model for accurate prediction of the proportion of cancer cells within tumor tissue. This is a necessary step for precision oncology and target therapy in cancer. They talk about their view of data science and the evolution of pathology in the coming years.
    Language English
    Publishing date 2022-02-11
    Publishing country United States
    Document type News
    ISSN 2666-3899
    ISSN (online) 2666-3899
    DOI 10.1016/j.patter.2022.100447
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Pan-omics analysis of biological data.

    Sung, Wing-Kin

    Methods (San Diego, Calif.)

    2016  Volume 102, Page(s) 1–2

    Language English
    Publishing date 2016-06-01
    Publishing country United States
    Document type Editorial
    ZDB-ID 1066584-5
    ISSN 1095-9130 ; 1046-2023
    ISSN (online) 1095-9130
    ISSN 1046-2023
    DOI 10.1016/j.ymeth.2016.05.004
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: SurVIndel: improving CNV calling from high-throughput sequencing data through statistical testing.

    Rajaby, Ramesh / Sung, Wing-Kin

    Bioinformatics (Oxford, England)

    2019  Volume 37, Issue 11, Page(s) 1497–1505

    Abstract: Motivation: Structural variations (SVs) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are ...

    Abstract Motivation: Structural variations (SVs) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome. Since paired-end whole genome sequencing data have become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data.
    Results: We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets.
    Availability and implementation: SurVIndel is available at https://github.com/Mesh89/SurVIndel.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    Language English
    Publishing date 2019-04-02
    Publishing country England
    Document type Journal Article
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btz261
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Distribution based MIL pooling filters: Experiments on a lymph node metastases dataset.

    Oner, Mustafa Umit / Kye-Jet, Jared Marc Song / Lee, Hwee Kuan / Sung, Wing-Kin

    Medical image analysis

    2023  Volume 87, Page(s) 102813

    Abstract: Histopathology is a crucial diagnostic tool in cancer and involves the analysis of gigapixel slides. Multiple instance learning (MIL) promises success in digital histopathology thanks to its ability to handle gigapixel slides and work with weak labels. ... ...

    Abstract Histopathology is a crucial diagnostic tool in cancer and involves the analysis of gigapixel slides. Multiple instance learning (MIL) promises success in digital histopathology thanks to its ability to handle gigapixel slides and work with weak labels. MIL is a machine learning paradigm that learns the mapping between bags of instances and bag labels. It represents a slide as a bag of patches and uses the slide's weak label as the bag's label. This paper introduces distribution-based pooling filters that obtain a bag-level representation by estimating marginal distributions of instance features. We formally prove that the distribution-based pooling filters are more expressive than the classical point estimate-based counterparts, like 'max' and 'mean' pooling, in terms of the amount of information captured while obtaining bag-level representations. Moreover, we empirically show that models with distribution-based pooling filters perform equal to or better than those with point estimate-based pooling filters on distinct real-world MIL tasks defined on the CAMELYON16 lymph node metastases dataset. Our model with a distribution pooling filter achieves an area under the receiver operating characteristics curve value of 0.9325 (95% confidence interval: 0.8798 - 0.9743) in the tumor vs. normal slide classification task.
    MeSH term(s) Humans ; Algorithms ; Lymphatic Metastasis ; Machine Learning ; ROC Curve
    Language English
    Publishing date 2023-04-20
    Publishing country Netherlands
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1356436-5
    ISSN 1361-8423 ; 1361-8431 ; 1361-8415
    ISSN (online) 1361-8423 ; 1361-8431
    ISSN 1361-8415
    DOI 10.1016/j.media.2023.102813
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.

    Rajaby, Ramesh / Sung, Wing-Kin

    Nucleic acids research

    2018  Volume 46, Issue 20, Page(s) e122

    Abstract: Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that ...

    Abstract Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
    MeSH term(s) Algorithms ; Computational Biology/methods ; DNA Transposable Elements/genetics ; Databases, Factual ; Genome, Human/genetics ; Genomics/methods ; High-Throughput Nucleotide Sequencing/methods ; Humans ; Mutagenesis, Insertional ; Reproducibility of Results
    Chemical Substances DNA Transposable Elements
    Language English
    Publishing date 2018-08-23
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gky685
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Deep Learning-Based Segmentation of Peach Diseases Using Convolutional Neural Network.

    Yao, Na / Ni, Fuchuan / Wu, Minghao / Wang, Haiyan / Li, Guoliang / Sung, Wing-Kin

    Frontiers in plant science

    2022  Volume 13, Page(s) 876357

    Abstract: Peach diseases seriously affect peach yield and people's health. The precise identification of peach diseases and the segmentation of the diseased areas can provide the basis for disease control and treatment. However, the complex background and ... ...

    Abstract Peach diseases seriously affect peach yield and people's health. The precise identification of peach diseases and the segmentation of the diseased areas can provide the basis for disease control and treatment. However, the complex background and imbalanced samples bring certain challenges to the segmentation and recognition of lesion area, and the hard samples and imbalance samples can lead to a decline in classification of foreground class and background class. In this paper we applied deep network models (Mask R-CNN and Mask Scoring R-CNN) for segmentation and recognition of peach diseases. Mask R-CNN and Mask Scoring R-CNN are classic instance segmentation models. Using instance segmentation model can obtain the disease names, disease location and disease segmentation, and the foreground area is the basic feature for next segmentation. Focal Loss can solve the problems caused by difficult samples and imbalance samples, and was used for this dataset to improve segmentation accuracy. Experimental results show that Mask Scoring R-CNN with Focal Loss function can improve recognition rate and segmentation accuracy comparing to Mask Scoring R-CNN with CE loss or comparing to Mask R-CNN. When ResNet50 is used as the backbone network based on Mask R-CNN, the segmentation accuracy of segm_mAP_50 increased from 0.236 to 0.254. When ResNetx101 is used as the backbone network, the segmentation accuracy of segm_mAP_50 increased from 0.452 to 0.463. In summary, this paper used Focal Loss on Mask R-CNN and Mask Scoring R-CNN to generate better mAP of segmentation and output more detailed information about peach diseases.
    Language English
    Publishing date 2022-05-25
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2613694-6
    ISSN 1664-462X
    ISSN 1664-462X
    DOI 10.3389/fpls.2022.876357
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: BATVI: Fast, sensitive and accurate detection of virus integrations.

    Tennakoon, Chandana / Sung, Wing Kin

    BMC bioinformatics

    2017  Volume 18, Issue Suppl 3, Page(s) 71

    Abstract: Background: The study of virus integrations in human genome is important since virus integrations were shown to be associated with diseases. In the literature, few methods have been proposed that predict virus integrations using next generation ... ...

    Abstract Background: The study of virus integrations in human genome is important since virus integrations were shown to be associated with diseases. In the literature, few methods have been proposed that predict virus integrations using next generation sequencing datasets. Although they work, they are slow and are not very sensitive.
    Results and discussion: This paper introduces a new method BatVI to predict viral integrations. Our method uses a fast screening method to filter out chimeric reads containing possible viral integrations. Next, sensitive alignments of these candidate chimeric reads are called by BLAST. Chimeric reads that are co-localized in the human genome are clustered. Finally, by assembling the chimeric reads in each cluster, high confident virus integration sites are extracted.
    Conclusion: We compared the performance of BatVI with existing methods VirusFinder and VirusSeq using both simulated and real-life datasets of liver cancer patients. BatVI ran an order of magnitude faster and was able to predict almost twice the number of true positives compared to other methods while maintaining a false positive rate less than 1%. For the liver cancer datasets, BatVI uncovered novel integrations to two important genes TERT and MLL4, which were missed by previous studies. Through gene expression data, we verified the correctness of these additional integrations. BatVI can be downloaded from http://biogpu.ddns.comp.nus.edu.sg/~ksung/batvi/index.html .
    MeSH term(s) Algorithms ; Cluster Analysis ; DNA, Viral/genetics ; DNA-Binding Proteins/genetics ; DNA-Binding Proteins/metabolism ; Genome, Human ; High-Throughput Nucleotide Sequencing ; Host-Pathogen Interactions/genetics ; Humans ; Liver Neoplasms/diagnosis ; Liver Neoplasms/virology ; Models, Theoretical ; Sequence Analysis, DNA ; Software ; Telomerase/genetics ; Telomerase/metabolism ; Virus Integration
    Chemical Substances DNA, Viral ; DNA-Binding Proteins ; MLL4 protein, human (EC 2.1.1.43) ; TERT protein, human (EC 2.7.7.49) ; Telomerase (EC 2.7.7.49)
    Language English
    Publishing date 2017-03-14
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-017-1470-x
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: INSurVeyor: improving insertion calling from short read sequencing data.

    Rajaby, Ramesh / Liu, Dong-Xu / Au, Chun Hang / Cheung, Yuen-Ting / Lau, Amy Yuet Ting / Yang, Qing-Yong / Sung, Wing-Kin

    Nature communications

    2023  Volume 14, Issue 1, Page(s) 3243

    Abstract: Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally ... ...

    Abstract Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods.
    MeSH term(s) High-Throughput Nucleotide Sequencing/methods ; Sequence Analysis, DNA/methods
    Language English
    Publishing date 2023-06-05
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-023-38870-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top