Cargando…

Predicting genome-wide redundancy using machine learning

BACKGROUND: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Huang-Wen, Bandyopadhyay, Sunayan, Shasha, Dennis E, Birnbaum, Kenneth D
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998534/ https://www.ncbi.nlm.nih.gov/pubmed/21087504 http://dx.doi.org/10.1186/1471-2148-10-357

_version_	1782193380456923136
author	Chen, Huang-Wen Bandyopadhyay, Sunayan Shasha, Dennis E Birnbaum, Kenneth D
author_facet	Chen, Huang-Wen Bandyopadhyay, Sunayan Shasha, Dennis E Birnbaum, Kenneth D
author_sort	Chen, Huang-Wen
collection	PubMed
description	BACKGROUND: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. RESULTS: Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. CONCLUSIONS: Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.
format	Text
id	pubmed-2998534
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29985342010-12-08 Predicting genome-wide redundancy using machine learning Chen, Huang-Wen Bandyopadhyay, Sunayan Shasha, Dennis E Birnbaum, Kenneth D BMC Evol Biol Research Article BACKGROUND: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. RESULTS: Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. CONCLUSIONS: Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms. BioMed Central 2010-11-18 /pmc/articles/PMC2998534/ /pubmed/21087504 http://dx.doi.org/10.1186/1471-2148-10-357 Text en Copyright ©2010 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Chen, Huang-Wen Bandyopadhyay, Sunayan Shasha, Dennis E Birnbaum, Kenneth D Predicting genome-wide redundancy using machine learning
title	Predicting genome-wide redundancy using machine learning
title_full	Predicting genome-wide redundancy using machine learning
title_fullStr	Predicting genome-wide redundancy using machine learning
title_full_unstemmed	Predicting genome-wide redundancy using machine learning
title_short	Predicting genome-wide redundancy using machine learning
title_sort	predicting genome-wide redundancy using machine learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998534/ https://www.ncbi.nlm.nih.gov/pubmed/21087504 http://dx.doi.org/10.1186/1471-2148-10-357
work_keys_str_mv	AT chenhuangwen predictinggenomewideredundancyusingmachinelearning AT bandyopadhyaysunayan predictinggenomewideredundancyusingmachinelearning AT shashadennise predictinggenomewideredundancyusingmachinelearning AT birnbaumkennethd predictinggenomewideredundancyusingmachinelearning

Predicting genome-wide redundancy using machine learning

Ejemplares similares