Cargando…

A singular value decomposition approach for improved taxonomic classification of biological sequences

BACKGROUND: Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in...

Descripción completa

Detalles Bibliográficos
Autores principales: Santos, Anderson R, Santos, Marcos A, Baumbach, Jan, McCulloch, John A, Oliveira, Guilherme C, Silva, Artur, Miyoshi, Anderson, Azevedo, Vasco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287580/
https://www.ncbi.nlm.nih.gov/pubmed/22369633
http://dx.doi.org/10.1186/1471-2164-12-S4-S11
_version_ 1782224695882416128
author Santos, Anderson R
Santos, Marcos A
Baumbach, Jan
McCulloch, John A
Oliveira, Guilherme C
Silva, Artur
Miyoshi, Anderson
Azevedo, Vasco
author_facet Santos, Anderson R
Santos, Marcos A
Baumbach, Jan
McCulloch, John A
Oliveira, Guilherme C
Silva, Artur
Miyoshi, Anderson
Azevedo, Vasco
author_sort Santos, Anderson R
collection PubMed
description BACKGROUND: Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. RESULTS: We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. CONCLUSIONS: By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy.
format Online
Article
Text
id pubmed-3287580
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32875802012-02-28 A singular value decomposition approach for improved taxonomic classification of biological sequences Santos, Anderson R Santos, Marcos A Baumbach, Jan McCulloch, John A Oliveira, Guilherme C Silva, Artur Miyoshi, Anderson Azevedo, Vasco BMC Genomics Proceedings BACKGROUND: Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. RESULTS: We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. CONCLUSIONS: By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. BioMed Central 2011-12-22 /pmc/articles/PMC3287580/ /pubmed/22369633 http://dx.doi.org/10.1186/1471-2164-12-S4-S11 Text en Copyright ©2011 Santos et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Santos, Anderson R
Santos, Marcos A
Baumbach, Jan
McCulloch, John A
Oliveira, Guilherme C
Silva, Artur
Miyoshi, Anderson
Azevedo, Vasco
A singular value decomposition approach for improved taxonomic classification of biological sequences
title A singular value decomposition approach for improved taxonomic classification of biological sequences
title_full A singular value decomposition approach for improved taxonomic classification of biological sequences
title_fullStr A singular value decomposition approach for improved taxonomic classification of biological sequences
title_full_unstemmed A singular value decomposition approach for improved taxonomic classification of biological sequences
title_short A singular value decomposition approach for improved taxonomic classification of biological sequences
title_sort singular value decomposition approach for improved taxonomic classification of biological sequences
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287580/
https://www.ncbi.nlm.nih.gov/pubmed/22369633
http://dx.doi.org/10.1186/1471-2164-12-S4-S11
work_keys_str_mv AT santosandersonr asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT santosmarcosa asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT baumbachjan asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT mccullochjohna asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT oliveiraguilhermec asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT silvaartur asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT miyoshianderson asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT azevedovasco asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT santosandersonr singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT santosmarcosa singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT baumbachjan singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT mccullochjohna singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT oliveiraguilhermec singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT silvaartur singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT miyoshianderson singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences
AT azevedovasco singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences