Cargando…
MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification
BACKGROUND: Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the m...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139023/ https://www.ncbi.nlm.nih.gov/pubmed/27980679 http://dx.doi.org/10.1186/s13040-016-0116-2 |
_version_ | 1782472167945928704 |
---|---|
author | Fiscon, Giulia Weitschek, Emanuel Cella, Eleonora Lo Presti, Alessandra Giovanetti, Marta Babakir-Mina, Muhammed Ciotti, Marco Ciccozzi, Massimo Pierangeli, Alessandra Bertolazzi, Paola Felici, Giovanni |
author_facet | Fiscon, Giulia Weitschek, Emanuel Cella, Eleonora Lo Presti, Alessandra Giovanetti, Marta Babakir-Mina, Muhammed Ciotti, Marco Ciccozzi, Massimo Pierangeli, Alessandra Bertolazzi, Paola Felici, Giovanni |
author_sort | Fiscon, Giulia |
collection | PubMed |
description | BACKGROUND: Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods. RESULTS: We propose a supervised method based on a genetic algorithm to identify small genomic subsequences that discriminate among different species. The method identifies multiple subsequences of bounded length with the same information power in a given genomic region. The algorithm has been successfully evaluated through its integration into a rule-based classification framework and applied to three different biological data sets: Influenza, Polyoma, and Rhino virus sequences. CONCLUSIONS: We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0116-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5139023 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51390232016-12-15 MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification Fiscon, Giulia Weitschek, Emanuel Cella, Eleonora Lo Presti, Alessandra Giovanetti, Marta Babakir-Mina, Muhammed Ciotti, Marco Ciccozzi, Massimo Pierangeli, Alessandra Bertolazzi, Paola Felici, Giovanni BioData Min Research BACKGROUND: Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods. RESULTS: We propose a supervised method based on a genetic algorithm to identify small genomic subsequences that discriminate among different species. The method identifies multiple subsequences of bounded length with the same information power in a given genomic region. The algorithm has been successfully evaluated through its integration into a rule-based classification framework and applied to three different biological data sets: Influenza, Polyoma, and Rhino virus sequences. CONCLUSIONS: We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0116-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-06 /pmc/articles/PMC5139023/ /pubmed/27980679 http://dx.doi.org/10.1186/s13040-016-0116-2 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Fiscon, Giulia Weitschek, Emanuel Cella, Eleonora Lo Presti, Alessandra Giovanetti, Marta Babakir-Mina, Muhammed Ciotti, Marco Ciccozzi, Massimo Pierangeli, Alessandra Bertolazzi, Paola Felici, Giovanni MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
title | MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
title_full | MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
title_fullStr | MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
title_full_unstemmed | MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
title_short | MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
title_sort | missel: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139023/ https://www.ncbi.nlm.nih.gov/pubmed/27980679 http://dx.doi.org/10.1186/s13040-016-0116-2 |
work_keys_str_mv | AT fiscongiulia misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT weitschekemanuel misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT cellaeleonora misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT loprestialessandra misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT giovanettimarta misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT babakirminamuhammed misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT ciottimarco misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT ciccozzimassimo misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT pierangelialessandra misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT bertolazzipaola misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification AT felicigiovanni misselamethodtoidentifyalargenumberofsmallspeciesspecificgenomicsubsequencesanditsapplicationtovirusesclassification |