Cargando…

Machine learning approach for pooled DNA sample calibration

BACKGROUND: Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled s...

Descripción completa

Detalles Bibliográficos
Autores principales: Hellicar, Andrew D, Rahman, Ashfaqur, Smith, Daniel V, Henshall, John M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4495942/
https://www.ncbi.nlm.nih.gov/pubmed/26156142
http://dx.doi.org/10.1186/s12859-015-0593-1
_version_ 1782380323621830656
author Hellicar, Andrew D
Rahman, Ashfaqur
Smith, Daniel V
Henshall, John M
author_facet Hellicar, Andrew D
Rahman, Ashfaqur
Smith, Daniel V
Henshall, John M
author_sort Hellicar, Andrew D
collection PubMed
description BACKGROUND: Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified. RESULTS: The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data. CONCLUSION: This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation.
format Online
Article
Text
id pubmed-4495942
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44959422015-07-09 Machine learning approach for pooled DNA sample calibration Hellicar, Andrew D Rahman, Ashfaqur Smith, Daniel V Henshall, John M BMC Bioinformatics Methodology Article BACKGROUND: Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified. RESULTS: The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data. CONCLUSION: This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation. BioMed Central 2015-07-09 /pmc/articles/PMC4495942/ /pubmed/26156142 http://dx.doi.org/10.1186/s12859-015-0593-1 Text en © Hellicar et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Hellicar, Andrew D
Rahman, Ashfaqur
Smith, Daniel V
Henshall, John M
Machine learning approach for pooled DNA sample calibration
title Machine learning approach for pooled DNA sample calibration
title_full Machine learning approach for pooled DNA sample calibration
title_fullStr Machine learning approach for pooled DNA sample calibration
title_full_unstemmed Machine learning approach for pooled DNA sample calibration
title_short Machine learning approach for pooled DNA sample calibration
title_sort machine learning approach for pooled dna sample calibration
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4495942/
https://www.ncbi.nlm.nih.gov/pubmed/26156142
http://dx.doi.org/10.1186/s12859-015-0593-1
work_keys_str_mv AT hellicarandrewd machinelearningapproachforpooleddnasamplecalibration
AT rahmanashfaqur machinelearningapproachforpooleddnasamplecalibration
AT smithdanielv machinelearningapproachforpooleddnasamplecalibration
AT henshalljohnm machinelearningapproachforpooleddnasamplecalibration