Cargando…

BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification

With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information...

Descripción completa

Detalles Bibliográficos
Autores principales: Ito, Eric Augusto, Katahira, Isaque, Vicente, Fábio Fernandes da Rocha, Pereira, Luiz Filipe Protasio, Lopes, Fabrício Martins
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144827/
https://www.ncbi.nlm.nih.gov/pubmed/29873784
http://dx.doi.org/10.1093/nar/gky462
_version_ 1783356150644736000
author Ito, Eric Augusto
Katahira, Isaque
Vicente, Fábio Fernandes da Rocha
Pereira, Luiz Filipe Protasio
Lopes, Fabrício Martins
author_facet Ito, Eric Augusto
Katahira, Isaque
Vicente, Fábio Fernandes da Rocha
Pereira, Luiz Filipe Protasio
Lopes, Fabrício Martins
author_sort Ito, Eric Augusto
collection PubMed
description With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
format Online
Article
Text
id pubmed-6144827
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61448272018-09-25 BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification Ito, Eric Augusto Katahira, Isaque Vicente, Fábio Fernandes da Rocha Pereira, Luiz Filipe Protasio Lopes, Fabrício Martins Nucleic Acids Res Methods Online With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET. Oxford University Press 2018-09-19 2018-06-05 /pmc/articles/PMC6144827/ /pubmed/29873784 http://dx.doi.org/10.1093/nar/gky462 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Ito, Eric Augusto
Katahira, Isaque
Vicente, Fábio Fernandes da Rocha
Pereira, Luiz Filipe Protasio
Lopes, Fabrício Martins
BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
title BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
title_full BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
title_fullStr BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
title_full_unstemmed BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
title_short BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
title_sort basinet—biological sequences network: a case study on coding and non-coding rnas identification
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144827/
https://www.ncbi.nlm.nih.gov/pubmed/29873784
http://dx.doi.org/10.1093/nar/gky462
work_keys_str_mv AT itoericaugusto basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification
AT katahiraisaque basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification
AT vicentefabiofernandesdarocha basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification
AT pereiraluizfilipeprotasio basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification
AT lopesfabriciomartins basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification