Article ; Online: A tool for feature extraction from biological sequences.
2022 Volume 23, Issue 3
Abstract: With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, ... ...
Abstract | With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X. |
---|---|
MeSH term(s) | DNA/genetics ; Humans ; Machine Learning ; Proteins/chemistry ; RNA/genetics ; Sequence Analysis/methods |
Chemical Substances | Proteins ; RNA (63231-63-0) ; DNA (9007-49-2) |
Language | English |
Publishing date | 2022-04-05 |
Publishing country | England |
Document type | Journal Article |
ZDB-ID | 2068142-2 |
ISSN | 1477-4054 ; 1467-5463 |
ISSN (online) | 1477-4054 |
ISSN | 1467-5463 |
DOI | 10.1093/bib/bbac108 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
More links
Kategorien
In stock of ZB MED Cologne/Königswinter
Zs.A 6262: Show issues | Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 2021: Bestellungen von Artikeln über das Online-Bestellformular ab Jg. 2022: Lesesaal (EG) |
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.