Cargando…

Core set approach to reduce uncertainty of gene trees

BACKGROUND: A genealogy based on gene sequences within a species plays an essential role in the estimation of the character, structure, and evolutionary history of that species. Because intraspecific sequences are more closely related than interspecific ones, detailed information on the evolutionary...

Descripción completa

Detalles Bibliográficos
Autores principales: Okabayashi, Takahisa, Kitazoe, Yasuhiro, Kishino, Hirohisa, Watabe, Teruaki, Nakajima, Noriaki, Okuhara, Yoshiyasu, O'Loughlin, Samantha, Walton, Catherine
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1508163/
https://www.ncbi.nlm.nih.gov/pubmed/16712735
http://dx.doi.org/10.1186/1471-2148-6-41
_version_ 1782128437108932608
author Okabayashi, Takahisa
Kitazoe, Yasuhiro
Kishino, Hirohisa
Watabe, Teruaki
Nakajima, Noriaki
Okuhara, Yoshiyasu
O'Loughlin, Samantha
Walton, Catherine
author_facet Okabayashi, Takahisa
Kitazoe, Yasuhiro
Kishino, Hirohisa
Watabe, Teruaki
Nakajima, Noriaki
Okuhara, Yoshiyasu
O'Loughlin, Samantha
Walton, Catherine
author_sort Okabayashi, Takahisa
collection PubMed
description BACKGROUND: A genealogy based on gene sequences within a species plays an essential role in the estimation of the character, structure, and evolutionary history of that species. Because intraspecific sequences are more closely related than interspecific ones, detailed information on the evolutionary process may be available by determining all the node sequences of trees and provide insight into functional constraints and adaptations. However, strong evolutionary correlations on a few lineages make this determination difficult as a whole, and the maximum parsimony (MP) method frequently allows a number of topologies with a same total branching length. RESULTS: Kitazoe et al. developed multidimensional vector-space representation of phylogeny. It converts additivity of evolutionary distances to orthogonality among the vectors expressing branches, and provides a unified index to measure deviations from the orthogoality. In this paper, this index is used to detect and exclude sequences with large deviations from orthogonality, and then selects a maximum subset ("core set") of sequences for which MP generates a single solution. Once the core set tree is formed whose all the node sequences are given, the excluded sequences are found to have basically two phylogenetic positions on this tree, respectively. Fortunately, since multiple substitutions are rare in intra-species sequences, the variance of nucleotide transitions is confined to a small range. By applying the core set approach to 38 partial env sequences of HIV-1 in a single patient and also 198 mitochondrial COI and COII DNA sequences of Anopheles dirus, we demonstrate how consistently this approach constructs the tree. CONCLUSION: In the HIV dataset, we confirmed that the obtained core set tree is the unique maximum set for which MP proposes a single tree. In the mosquito data set, the fluctuation of nucleotide transitions caused by the sequences excluded from the core set was very small. We reproduced this core-set tree by simulation based on random process, and applied our approach to many sets of the obtained endpoint sequences. Consequently, the ninety percent of the endpoint sequences was identified as the core sets and the obtained node sequences were perfectly identical to the true ones.
format Text
id pubmed-1508163
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15081632006-07-15 Core set approach to reduce uncertainty of gene trees Okabayashi, Takahisa Kitazoe, Yasuhiro Kishino, Hirohisa Watabe, Teruaki Nakajima, Noriaki Okuhara, Yoshiyasu O'Loughlin, Samantha Walton, Catherine BMC Evol Biol Methodology Article BACKGROUND: A genealogy based on gene sequences within a species plays an essential role in the estimation of the character, structure, and evolutionary history of that species. Because intraspecific sequences are more closely related than interspecific ones, detailed information on the evolutionary process may be available by determining all the node sequences of trees and provide insight into functional constraints and adaptations. However, strong evolutionary correlations on a few lineages make this determination difficult as a whole, and the maximum parsimony (MP) method frequently allows a number of topologies with a same total branching length. RESULTS: Kitazoe et al. developed multidimensional vector-space representation of phylogeny. It converts additivity of evolutionary distances to orthogonality among the vectors expressing branches, and provides a unified index to measure deviations from the orthogoality. In this paper, this index is used to detect and exclude sequences with large deviations from orthogonality, and then selects a maximum subset ("core set") of sequences for which MP generates a single solution. Once the core set tree is formed whose all the node sequences are given, the excluded sequences are found to have basically two phylogenetic positions on this tree, respectively. Fortunately, since multiple substitutions are rare in intra-species sequences, the variance of nucleotide transitions is confined to a small range. By applying the core set approach to 38 partial env sequences of HIV-1 in a single patient and also 198 mitochondrial COI and COII DNA sequences of Anopheles dirus, we demonstrate how consistently this approach constructs the tree. CONCLUSION: In the HIV dataset, we confirmed that the obtained core set tree is the unique maximum set for which MP proposes a single tree. In the mosquito data set, the fluctuation of nucleotide transitions caused by the sequences excluded from the core set was very small. We reproduced this core-set tree by simulation based on random process, and applied our approach to many sets of the obtained endpoint sequences. Consequently, the ninety percent of the endpoint sequences was identified as the core sets and the obtained node sequences were perfectly identical to the true ones. BioMed Central 2006-05-20 /pmc/articles/PMC1508163/ /pubmed/16712735 http://dx.doi.org/10.1186/1471-2148-6-41 Text en Copyright © 2006 Okabayashi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Okabayashi, Takahisa
Kitazoe, Yasuhiro
Kishino, Hirohisa
Watabe, Teruaki
Nakajima, Noriaki
Okuhara, Yoshiyasu
O'Loughlin, Samantha
Walton, Catherine
Core set approach to reduce uncertainty of gene trees
title Core set approach to reduce uncertainty of gene trees
title_full Core set approach to reduce uncertainty of gene trees
title_fullStr Core set approach to reduce uncertainty of gene trees
title_full_unstemmed Core set approach to reduce uncertainty of gene trees
title_short Core set approach to reduce uncertainty of gene trees
title_sort core set approach to reduce uncertainty of gene trees
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1508163/
https://www.ncbi.nlm.nih.gov/pubmed/16712735
http://dx.doi.org/10.1186/1471-2148-6-41
work_keys_str_mv AT okabayashitakahisa coresetapproachtoreduceuncertaintyofgenetrees
AT kitazoeyasuhiro coresetapproachtoreduceuncertaintyofgenetrees
AT kishinohirohisa coresetapproachtoreduceuncertaintyofgenetrees
AT watabeteruaki coresetapproachtoreduceuncertaintyofgenetrees
AT nakajimanoriaki coresetapproachtoreduceuncertaintyofgenetrees
AT okuharayoshiyasu coresetapproachtoreduceuncertaintyofgenetrees
AT oloughlinsamantha coresetapproachtoreduceuncertaintyofgenetrees
AT waltoncatherine coresetapproachtoreduceuncertaintyofgenetrees