Cargando…

A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)

BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple...

Descripción completa

Detalles Bibliográficos
Autores principales: Bermejo-Das-Neves, Carlos, Nguyen, Hoan-Ngoc, Poch, Olivier, Thompson, Julie D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4021375/
https://www.ncbi.nlm.nih.gov/pubmed/24742296
http://dx.doi.org/10.1186/1471-2105-15-111
_version_ 1782316229973770240
author Bermejo-Das-Neves, Carlos
Nguyen, Hoan-Ngoc
Poch, Olivier
Thompson, Julie D
author_facet Bermejo-Das-Neves, Carlos
Nguyen, Hoan-Ngoc
Poch, Olivier
Thompson, Julie D
author_sort Bermejo-Das-Neves, Carlos
collection PubMed
description BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. RESULTS: In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. CONCLUSIONS: We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease.
format Online
Article
Text
id pubmed-4021375
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40213752014-05-28 A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) Bermejo-Das-Neves, Carlos Nguyen, Hoan-Ngoc Poch, Olivier Thompson, Julie D BMC Bioinformatics Research Article BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. RESULTS: In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. CONCLUSIONS: We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease. BioMed Central 2014-04-17 /pmc/articles/PMC4021375/ /pubmed/24742296 http://dx.doi.org/10.1186/1471-2105-15-111 Text en Copyright © 2014 Bermejo-Das-Neves et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bermejo-Das-Neves, Carlos
Nguyen, Hoan-Ngoc
Poch, Olivier
Thompson, Julie D
A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
title A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
title_full A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
title_fullStr A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
title_full_unstemmed A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
title_short A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)
title_sort comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (kd4i)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4021375/
https://www.ncbi.nlm.nih.gov/pubmed/24742296
http://dx.doi.org/10.1186/1471-2105-15-111
work_keys_str_mv AT bermejodasnevescarlos acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT nguyenhoanngoc acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT pocholivier acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT thompsonjulied acomprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT bermejodasnevescarlos comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT nguyenhoanngoc comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT pocholivier comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i
AT thompsonjulied comprehensivestudyofsmallnonframeshiftinsertionsdeletionsinproteinsandpredictionoftheirphenotypiceffectsbyamachinelearningmethodkd4i