Cargando…
A singular value decomposition approach for improved taxonomic classification of biological sequences
BACKGROUND: Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287580/ https://www.ncbi.nlm.nih.gov/pubmed/22369633 http://dx.doi.org/10.1186/1471-2164-12-S4-S11 |
_version_ | 1782224695882416128 |
---|---|
author | Santos, Anderson R Santos, Marcos A Baumbach, Jan McCulloch, John A Oliveira, Guilherme C Silva, Artur Miyoshi, Anderson Azevedo, Vasco |
author_facet | Santos, Anderson R Santos, Marcos A Baumbach, Jan McCulloch, John A Oliveira, Guilherme C Silva, Artur Miyoshi, Anderson Azevedo, Vasco |
author_sort | Santos, Anderson R |
collection | PubMed |
description | BACKGROUND: Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. RESULTS: We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. CONCLUSIONS: By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. |
format | Online Article Text |
id | pubmed-3287580 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32875802012-02-28 A singular value decomposition approach for improved taxonomic classification of biological sequences Santos, Anderson R Santos, Marcos A Baumbach, Jan McCulloch, John A Oliveira, Guilherme C Silva, Artur Miyoshi, Anderson Azevedo, Vasco BMC Genomics Proceedings BACKGROUND: Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. RESULTS: We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. CONCLUSIONS: By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. BioMed Central 2011-12-22 /pmc/articles/PMC3287580/ /pubmed/22369633 http://dx.doi.org/10.1186/1471-2164-12-S4-S11 Text en Copyright ©2011 Santos et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Santos, Anderson R Santos, Marcos A Baumbach, Jan McCulloch, John A Oliveira, Guilherme C Silva, Artur Miyoshi, Anderson Azevedo, Vasco A singular value decomposition approach for improved taxonomic classification of biological sequences |
title | A singular value decomposition approach for improved taxonomic classification of biological sequences |
title_full | A singular value decomposition approach for improved taxonomic classification of biological sequences |
title_fullStr | A singular value decomposition approach for improved taxonomic classification of biological sequences |
title_full_unstemmed | A singular value decomposition approach for improved taxonomic classification of biological sequences |
title_short | A singular value decomposition approach for improved taxonomic classification of biological sequences |
title_sort | singular value decomposition approach for improved taxonomic classification of biological sequences |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287580/ https://www.ncbi.nlm.nih.gov/pubmed/22369633 http://dx.doi.org/10.1186/1471-2164-12-S4-S11 |
work_keys_str_mv | AT santosandersonr asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT santosmarcosa asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT baumbachjan asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT mccullochjohna asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT oliveiraguilhermec asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT silvaartur asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT miyoshianderson asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT azevedovasco asingularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT santosandersonr singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT santosmarcosa singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT baumbachjan singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT mccullochjohna singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT oliveiraguilhermec singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT silvaartur singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT miyoshianderson singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences AT azevedovasco singularvaluedecompositionapproachforimprovedtaxonomicclassificationofbiologicalsequences |