Article ; Online: ARA: a flexible pipeline for automated exploration of NCBI SRA datasets.
2023 Volume 12
Abstract: Background: One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data ... ...
Abstract | Background: One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data generated by the next-generation sequencing (NGS) technologies, this approach is often not available. The hierarchical organization of the NGS records is primarily designed for browsing or text-based searches of the information provided in metadata-related keywords, limiting the efficiency of database exploration. Findings: We developed an automated pipeline that incorporates the well-established NGS data-processing tools and procedures to allow easy and effective sampling of the NCBI SRA database records. Given a file with query nucleotide sequences, our tool estimates the matching content of SRA accessions by probing only a user-defined fraction of a record's sequences. Based on the selected parameters, it allows performing a full mapping experiment with records that meet the required criteria. The pipeline is designed to be easy to operate-it offers a fully automatic setup procedure and is fixed on tested supporting tools. The modular design and implemented usage modes allow a user to scale up the analyses into complex computational infrastructure. Conclusions: We present an easy-to-operate and automated tool that expands the way a user can access and explore the information contained within the records deposited in the NCBI SRA database. |
---|---|
MeSH term(s) | Amino Acid Sequence ; Databases, Factual ; High-Throughput Nucleotide Sequencing ; Metadata ; Nucleotides |
Chemical Substances | Nucleotides |
Language | English |
Publishing date | 2023-08-17 |
Publishing country | United States |
Document type | Journal Article ; Research Support, Non-U.S. Gov't |
ZDB-ID | 2708999-X |
ISSN | 2047-217X ; 2047-217X |
ISSN (online) | 2047-217X |
ISSN | 2047-217X |
DOI | 10.1093/gigascience/giad067 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
More links
Kategorien
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.