Cargando…

HIV-Specific Probabilistic Models of Protein Evolution

Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparat...

Descripción completa

Detalles Bibliográficos
Autores principales: Nickle, David C., Heath, Laura, Jensen, Mark A., Gilbert, Peter B., Mullins, James I., Kosakovsky Pond, Sergei L.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876811/
https://www.ncbi.nlm.nih.gov/pubmed/17551583
http://dx.doi.org/10.1371/journal.pone.0000503
_version_ 1782133557170274304
author Nickle, David C.
Heath, Laura
Jensen, Mark A.
Gilbert, Peter B.
Mullins, James I.
Kosakovsky Pond, Sergei L.
author_facet Nickle, David C.
Heath, Laura
Jensen, Mark A.
Gilbert, Peter B.
Mullins, James I.
Kosakovsky Pond, Sergei L.
author_sort Nickle, David C.
collection PubMed
description Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1–the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.
format Text
id pubmed-1876811
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18768112007-06-06 HIV-Specific Probabilistic Models of Protein Evolution Nickle, David C. Heath, Laura Jensen, Mark A. Gilbert, Peter B. Mullins, James I. Kosakovsky Pond, Sergei L. PLoS One Research Article Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1–the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses. Public Library of Science 2007-06-06 /pmc/articles/PMC1876811/ /pubmed/17551583 http://dx.doi.org/10.1371/journal.pone.0000503 Text en Nickle et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Nickle, David C.
Heath, Laura
Jensen, Mark A.
Gilbert, Peter B.
Mullins, James I.
Kosakovsky Pond, Sergei L.
HIV-Specific Probabilistic Models of Protein Evolution
title HIV-Specific Probabilistic Models of Protein Evolution
title_full HIV-Specific Probabilistic Models of Protein Evolution
title_fullStr HIV-Specific Probabilistic Models of Protein Evolution
title_full_unstemmed HIV-Specific Probabilistic Models of Protein Evolution
title_short HIV-Specific Probabilistic Models of Protein Evolution
title_sort hiv-specific probabilistic models of protein evolution
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876811/
https://www.ncbi.nlm.nih.gov/pubmed/17551583
http://dx.doi.org/10.1371/journal.pone.0000503
work_keys_str_mv AT nickledavidc hivspecificprobabilisticmodelsofproteinevolution
AT heathlaura hivspecificprobabilisticmodelsofproteinevolution
AT jensenmarka hivspecificprobabilisticmodelsofproteinevolution
AT gilbertpeterb hivspecificprobabilisticmodelsofproteinevolution
AT mullinsjamesi hivspecificprobabilisticmodelsofproteinevolution
AT kosakovskypondsergeil hivspecificprobabilisticmodelsofproteinevolution