Cargando…

Performance of random forests and logic regression methods using mini-exome sequence data

Machine learning approaches are an attractive option for analyzing large-scale data to detect genetic variants that contribute to variation of a quantitative trait, without requiring specific distributional assumptions. We evaluate two machine learning methods, random forests and logic regression, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Yoonhee, Li, Qing, Cropp, Cheryl D, Sung, Heejong, Cai, Juanliang, Simpson, Claire L, Perry, Brian, Dasgupta, Abhijit, Malley, James D, Wilson, Alexander F, Bailey-Wilson, Joan E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287827/
https://www.ncbi.nlm.nih.gov/pubmed/22373484
http://dx.doi.org/10.1186/1753-6561-5-S9-S104
_version_ 1782224752647077888
author Kim, Yoonhee
Li, Qing
Cropp, Cheryl D
Sung, Heejong
Cai, Juanliang
Simpson, Claire L
Perry, Brian
Dasgupta, Abhijit
Malley, James D
Wilson, Alexander F
Bailey-Wilson, Joan E
author_facet Kim, Yoonhee
Li, Qing
Cropp, Cheryl D
Sung, Heejong
Cai, Juanliang
Simpson, Claire L
Perry, Brian
Dasgupta, Abhijit
Malley, James D
Wilson, Alexander F
Bailey-Wilson, Joan E
author_sort Kim, Yoonhee
collection PubMed
description Machine learning approaches are an attractive option for analyzing large-scale data to detect genetic variants that contribute to variation of a quantitative trait, without requiring specific distributional assumptions. We evaluate two machine learning methods, random forests and logic regression, and compare them to standard simple univariate linear regression, using the Genetic Analysis Workshop 17 mini-exome data. We also apply these methods after collapsing multiple rare variants within genes and within gene pathways. Linear regression and the random forest method performed better when rare variants were collapsed based on genes or gene pathways than when each variant was analyzed separately. Logic regression performed better when rare variants were collapsed based on genes rather than on pathways.
format Online
Article
Text
id pubmed-3287827
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32878272012-02-28 Performance of random forests and logic regression methods using mini-exome sequence data Kim, Yoonhee Li, Qing Cropp, Cheryl D Sung, Heejong Cai, Juanliang Simpson, Claire L Perry, Brian Dasgupta, Abhijit Malley, James D Wilson, Alexander F Bailey-Wilson, Joan E BMC Proc Proceedings Machine learning approaches are an attractive option for analyzing large-scale data to detect genetic variants that contribute to variation of a quantitative trait, without requiring specific distributional assumptions. We evaluate two machine learning methods, random forests and logic regression, and compare them to standard simple univariate linear regression, using the Genetic Analysis Workshop 17 mini-exome data. We also apply these methods after collapsing multiple rare variants within genes and within gene pathways. Linear regression and the random forest method performed better when rare variants were collapsed based on genes or gene pathways than when each variant was analyzed separately. Logic regression performed better when rare variants were collapsed based on genes rather than on pathways. BioMed Central 2011-11-29 /pmc/articles/PMC3287827/ /pubmed/22373484 http://dx.doi.org/10.1186/1753-6561-5-S9-S104 Text en Copyright ©2011 Kim et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Kim, Yoonhee
Li, Qing
Cropp, Cheryl D
Sung, Heejong
Cai, Juanliang
Simpson, Claire L
Perry, Brian
Dasgupta, Abhijit
Malley, James D
Wilson, Alexander F
Bailey-Wilson, Joan E
Performance of random forests and logic regression methods using mini-exome sequence data
title Performance of random forests and logic regression methods using mini-exome sequence data
title_full Performance of random forests and logic regression methods using mini-exome sequence data
title_fullStr Performance of random forests and logic regression methods using mini-exome sequence data
title_full_unstemmed Performance of random forests and logic regression methods using mini-exome sequence data
title_short Performance of random forests and logic regression methods using mini-exome sequence data
title_sort performance of random forests and logic regression methods using mini-exome sequence data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287827/
https://www.ncbi.nlm.nih.gov/pubmed/22373484
http://dx.doi.org/10.1186/1753-6561-5-S9-S104
work_keys_str_mv AT kimyoonhee performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT liqing performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT croppcheryld performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT sungheejong performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT caijuanliang performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT simpsonclairel performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT perrybrian performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT dasguptaabhijit performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT malleyjamesd performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT wilsonalexanderf performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata
AT baileywilsonjoane performanceofrandomforestsandlogicregressionmethodsusingminiexomesequencedata