Article ; Online: Biclustering sparse binary genomic data.
Journal of computational biology : a journal of computational molecular cell biology
2008 Volume 15, Issue 10, Page(s) 1329–1345
Abstract: Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to ... ...
Abstract | Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories. |
---|---|
MeSH term(s) | Algorithms ; Cluster Analysis ; Computational Biology/methods ; Databases, Genetic ; Genome ; Models, Genetic ; Oligonucleotide Array Sequence Analysis/methods ; Software ; Transcription Factors/genetics ; Transcription Factors/metabolism |
Chemical Substances | Transcription Factors |
Language | English |
Publishing date | 2008-12 |
Publishing country | United States |
Document type | Journal Article |
ZDB-ID | 2030900-4 |
ISSN | 1557-8666 ; 1066-5277 |
ISSN (online) | 1557-8666 |
ISSN | 1066-5277 |
DOI | 10.1089/cmb.2008.0066 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
Full text online
More links
Kategorien
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.