Cargando…
Predicting the Functional Effect of Amino Acid Substitutions and Indels
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3466303/ https://www.ncbi.nlm.nih.gov/pubmed/23056405 http://dx.doi.org/10.1371/journal.pone.0046688 |
_version_ | 1782245671138492416 |
---|---|
author | Choi, Yongwook Sims, Gregory E. Murphy, Sean Miller, Jason R. Chan, Agnes P. |
author_facet | Choi, Yongwook Sims, Gregory E. Murphy, Sean Miller, Jason R. Chan, Agnes P. |
author_sort | Choi, Yongwook |
collection | PubMed |
description | As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org. |
format | Online Article Text |
id | pubmed-3466303 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34663032012-10-10 Predicting the Functional Effect of Amino Acid Substitutions and Indels Choi, Yongwook Sims, Gregory E. Murphy, Sean Miller, Jason R. Chan, Agnes P. PLoS One Research Article As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org. Public Library of Science 2012-10-08 /pmc/articles/PMC3466303/ /pubmed/23056405 http://dx.doi.org/10.1371/journal.pone.0046688 Text en © 2012 Choi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Choi, Yongwook Sims, Gregory E. Murphy, Sean Miller, Jason R. Chan, Agnes P. Predicting the Functional Effect of Amino Acid Substitutions and Indels |
title | Predicting the Functional Effect of Amino Acid Substitutions and Indels |
title_full | Predicting the Functional Effect of Amino Acid Substitutions and Indels |
title_fullStr | Predicting the Functional Effect of Amino Acid Substitutions and Indels |
title_full_unstemmed | Predicting the Functional Effect of Amino Acid Substitutions and Indels |
title_short | Predicting the Functional Effect of Amino Acid Substitutions and Indels |
title_sort | predicting the functional effect of amino acid substitutions and indels |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3466303/ https://www.ncbi.nlm.nih.gov/pubmed/23056405 http://dx.doi.org/10.1371/journal.pone.0046688 |
work_keys_str_mv | AT choiyongwook predictingthefunctionaleffectofaminoacidsubstitutionsandindels AT simsgregorye predictingthefunctionaleffectofaminoacidsubstitutionsandindels AT murphysean predictingthefunctionaleffectofaminoacidsubstitutionsandindels AT millerjasonr predictingthefunctionaleffectofaminoacidsubstitutionsandindels AT chanagnesp predictingthefunctionaleffectofaminoacidsubstitutionsandindels |