Cargando…
A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4021375/ https://www.ncbi.nlm.nih.gov/pubmed/24742296 http://dx.doi.org/10.1186/1471-2105-15-111 |
_version_ | 1782316229973770240 |
---|---|
author | Bermejo-Das-Neves, Carlos Nguyen, Hoan-Ngoc Poch, Olivier Thompson, Julie D |
author_facet | Bermejo-Das-Neves, Carlos Nguyen, Hoan-Ngoc Poch, Olivier Thompson, Julie D |
author_sort | Bermejo-Das-Neves, Carlos |
collection | PubMed |
description | BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. RESULTS: In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. CONCLUSIONS: We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease. |
format | Online Article Text |
id | pubmed-4021375 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40213752014-05-28 A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) Bermejo-Das-Neves, Carlos Nguyen, Hoan-Ngoc Poch, Olivier Thompson, Julie D BMC Bioinformatics Research Article BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. RESULTS: In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. CONCLUSIONS: We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease. BioMed Central 2014-04-17 /pmc/articles/PMC4021375/ /pubmed/24742296 http://dx.doi.org/10.1186/1471-2105-15-111 Text en Copyright © 2014 Bermejo-Das-Neves et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Bermejo-Das-Neves, Carlos Nguyen, Hoan-Ngoc Poch, Olivier Thompson, Julie D A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) |
title | A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) |
title_full | A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) |
title_fullStr | A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) |
title_full_unstemmed | A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) |
title_short | A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) |
title_sort | comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (kd4i) |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4021375/ https://www.ncbi.nlm.nih.gov/pubmed/24742296 http://dx.doi.org/10.1186/1471-2105-15-111 |
work_keys_str_mv | AT bermejodasnevescarlos acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT nguyenhoanngoc acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT pocholivier acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT thompsonjulied acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT bermejodasnevescarlos comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT nguyenhoanngoc comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT pocholivier comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i AT thompsonjulied comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i |