Cargando…
BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification
With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144827/ https://www.ncbi.nlm.nih.gov/pubmed/29873784 http://dx.doi.org/10.1093/nar/gky462 |
_version_ | 1783356150644736000 |
---|---|
author | Ito, Eric Augusto Katahira, Isaque Vicente, Fábio Fernandes da Rocha Pereira, Luiz Filipe Protasio Lopes, Fabrício Martins |
author_facet | Ito, Eric Augusto Katahira, Isaque Vicente, Fábio Fernandes da Rocha Pereira, Luiz Filipe Protasio Lopes, Fabrício Martins |
author_sort | Ito, Eric Augusto |
collection | PubMed |
description | With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET. |
format | Online Article Text |
id | pubmed-6144827 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61448272018-09-25 BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification Ito, Eric Augusto Katahira, Isaque Vicente, Fábio Fernandes da Rocha Pereira, Luiz Filipe Protasio Lopes, Fabrício Martins Nucleic Acids Res Methods Online With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET. Oxford University Press 2018-09-19 2018-06-05 /pmc/articles/PMC6144827/ /pubmed/29873784 http://dx.doi.org/10.1093/nar/gky462 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Ito, Eric Augusto Katahira, Isaque Vicente, Fábio Fernandes da Rocha Pereira, Luiz Filipe Protasio Lopes, Fabrício Martins BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification |
title | BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification |
title_full | BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification |
title_fullStr | BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification |
title_full_unstemmed | BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification |
title_short | BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification |
title_sort | basinet—biological sequences network: a case study on coding and non-coding rnas identification |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144827/ https://www.ncbi.nlm.nih.gov/pubmed/29873784 http://dx.doi.org/10.1093/nar/gky462 |
work_keys_str_mv | AT itoericaugusto basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification AT katahiraisaque basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification AT vicentefabiofernandesdarocha basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification AT pereiraluizfilipeprotasio basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification AT lopesfabriciomartins basinetbiologicalsequencesnetworkacasestudyoncodingandnoncodingrnasidentification |