Cargando…
In silico phenotyping via co-training for improved phenotype prediction from genotype
Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping project...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765855/ https://www.ncbi.nlm.nih.gov/pubmed/26072497 http://dx.doi.org/10.1093/bioinformatics/btv254 |
_version_ | 1782417583116386304 |
---|---|
author | Roqueiro, Damian Witteveen, Menno J. Anttila, Verneri Terwindt, Gisela M. van den Maagdenberg, Arn M.J.M. Borgwardt, Karsten |
author_facet | Roqueiro, Damian Witteveen, Menno J. Anttila, Verneri Terwindt, Gisela M. van den Maagdenberg, Arn M.J.M. Borgwardt, Karsten |
author_sort | Roqueiro, Damian |
collection | PubMed |
description | Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact: karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4765855 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-47658552016-03-04 In silico phenotyping via co-training for improved phenotype prediction from genotype Roqueiro, Damian Witteveen, Menno J. Anttila, Verneri Terwindt, Gisela M. van den Maagdenberg, Arn M.J.M. Borgwardt, Karsten Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact: karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4765855/ /pubmed/26072497 http://dx.doi.org/10.1093/bioinformatics/btv254 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Roqueiro, Damian Witteveen, Menno J. Anttila, Verneri Terwindt, Gisela M. van den Maagdenberg, Arn M.J.M. Borgwardt, Karsten In silico phenotyping via co-training for improved phenotype prediction from genotype |
title | In silico phenotyping via co-training for improved phenotype prediction from genotype |
title_full | In silico phenotyping via co-training for improved phenotype prediction from genotype |
title_fullStr | In silico phenotyping via co-training for improved phenotype prediction from genotype |
title_full_unstemmed | In silico phenotyping via co-training for improved phenotype prediction from genotype |
title_short | In silico phenotyping via co-training for improved phenotype prediction from genotype |
title_sort | in silico phenotyping via co-training for improved phenotype prediction from genotype |
topic | Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765855/ https://www.ncbi.nlm.nih.gov/pubmed/26072497 http://dx.doi.org/10.1093/bioinformatics/btv254 |
work_keys_str_mv | AT roqueirodamian insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype AT witteveenmennoj insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype AT anttilaverneri insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype AT terwindtgiselam insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype AT vandenmaagdenbergarnmjm insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype AT borgwardtkarsten insilicophenotypingviacotrainingforimprovedphenotypepredictionfromgenotype |