Cargando…

In silico phenotyping via co-training for improved phenotype prediction from genotype

Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping project...

Descripción completa

Detalles Bibliográficos
Autores principales: Roqueiro, Damian, Witteveen, Menno J., Anttila, Verneri, Terwindt, Gisela M., van den Maagdenberg, Arn M.J.M., Borgwardt, Karsten
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765855/
https://www.ncbi.nlm.nih.gov/pubmed/26072497
http://dx.doi.org/10.1093/bioinformatics/btv254
_version_ 1782417583116386304
author Roqueiro, Damian
Witteveen, Menno J.
Anttila, Verneri
Terwindt, Gisela M.
van den Maagdenberg, Arn M.J.M.
Borgwardt, Karsten
author_facet Roqueiro, Damian
Witteveen, Menno J.
Anttila, Verneri
Terwindt, Gisela M.
van den Maagdenberg, Arn M.J.M.
Borgwardt, Karsten
author_sort Roqueiro, Damian
collection PubMed
description Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact: karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4765855
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47658552016-03-04 In silico phenotyping via co-training for improved phenotype prediction from genotype Roqueiro, Damian Witteveen, Menno J. Anttila, Verneri Terwindt, Gisela M. van den Maagdenberg, Arn M.J.M. Borgwardt, Karsten Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact: karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4765855/ /pubmed/26072497 http://dx.doi.org/10.1093/bioinformatics/btv254 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
Roqueiro, Damian
Witteveen, Menno J.
Anttila, Verneri
Terwindt, Gisela M.
van den Maagdenberg, Arn M.J.M.
Borgwardt, Karsten
In silico phenotyping via co-training for improved phenotype prediction from genotype
title In silico phenotyping via co-training for improved phenotype prediction from genotype
title_full In silico phenotyping via co-training for improved phenotype prediction from genotype
title_fullStr In silico phenotyping via co-training for improved phenotype prediction from genotype
title_full_unstemmed In silico phenotyping via co-training for improved phenotype prediction from genotype
title_short In silico phenotyping via co-training for improved phenotype prediction from genotype
title_sort in silico phenotyping via co-training for improved phenotype prediction from genotype
topic Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765855/
https://www.ncbi.nlm.nih.gov/pubmed/26072497
http://dx.doi.org/10.1093/bioinformatics/btv254
work_keys_str_mv AT roqueirodamian insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype
AT witteveenmennoj insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype
AT anttilaverneri insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype
AT terwindtgiselam insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype
AT vandenmaagdenbergarnmjm insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype
AT borgwardtkarsten insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype