Cargando…

Predicting the Functional Effect of Amino Acid Substitutions and Indels

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Yongwook, Sims, Gregory E., Murphy, Sean, Miller, Jason R., Chan, Agnes P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3466303/
https://www.ncbi.nlm.nih.gov/pubmed/23056405
http://dx.doi.org/10.1371/journal.pone.0046688
_version_ 1782245671138492416
author Choi, Yongwook
Sims, Gregory E.
Murphy, Sean
Miller, Jason R.
Chan, Agnes P.
author_facet Choi, Yongwook
Sims, Gregory E.
Murphy, Sean
Miller, Jason R.
Chan, Agnes P.
author_sort Choi, Yongwook
collection PubMed
description As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.
format Online
Article
Text
id pubmed-3466303
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34663032012-10-10 Predicting the Functional Effect of Amino Acid Substitutions and Indels Choi, Yongwook Sims, Gregory E. Murphy, Sean Miller, Jason R. Chan, Agnes P. PLoS One Research Article As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org. Public Library of Science 2012-10-08 /pmc/articles/PMC3466303/ /pubmed/23056405 http://dx.doi.org/10.1371/journal.pone.0046688 Text en © 2012 Choi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Choi, Yongwook
Sims, Gregory E.
Murphy, Sean
Miller, Jason R.
Chan, Agnes P.
Predicting the Functional Effect of Amino Acid Substitutions and Indels
title Predicting the Functional Effect of Amino Acid Substitutions and Indels
title_full Predicting the Functional Effect of Amino Acid Substitutions and Indels
title_fullStr Predicting the Functional Effect of Amino Acid Substitutions and Indels
title_full_unstemmed Predicting the Functional Effect of Amino Acid Substitutions and Indels
title_short Predicting the Functional Effect of Amino Acid Substitutions and Indels
title_sort predicting the functional effect of amino acid substitutions and indels
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3466303/
https://www.ncbi.nlm.nih.gov/pubmed/23056405
http://dx.doi.org/10.1371/journal.pone.0046688
work_keys_str_mv AT choiyongwook predictingthefunctionaleffectofaminoacidsubstitutionsandindels
AT simsgregorye predictingthefunctionaleffectofaminoacidsubstitutionsandindels
AT murphysean predictingthefunctionaleffectofaminoacidsubstitutionsandindels
AT millerjasonr predictingthefunctionaleffectofaminoacidsubstitutionsandindels
AT chanagnesp predictingthefunctionaleffectofaminoacidsubstitutionsandindels