Cargando…

Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects

Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a datas...

Descripción completa

Detalles Bibliográficos
Autores principales: Farrer, Rhys A., Henk, Daniel A., MacLean, Dan, Studholme, David J., Fisher, Matthew C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3604800/
https://www.ncbi.nlm.nih.gov/pubmed/23518929
http://dx.doi.org/10.1038/srep01512
_version_ 1782263787751997440
author Farrer, Rhys A.
Henk, Daniel A.
MacLean, Dan
Studholme, David J.
Fisher, Matthew C.
author_facet Farrer, Rhys A.
Henk, Daniel A.
MacLean, Dan
Studholme, David J.
Fisher, Matthew C.
author_sort Farrer, Rhys A.
collection PubMed
description Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.
format Online
Article
Text
id pubmed-3604800
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-36048002013-03-21 Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects Farrer, Rhys A. Henk, Daniel A. MacLean, Dan Studholme, David J. Fisher, Matthew C. Sci Rep Article Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/. Nature Publishing Group 2013-03-21 /pmc/articles/PMC3604800/ /pubmed/23518929 http://dx.doi.org/10.1038/srep01512 Text en Copyright © 2013, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-nd/3.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
spellingShingle Article
Farrer, Rhys A.
Henk, Daniel A.
MacLean, Dan
Studholme, David J.
Fisher, Matthew C.
Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
title Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
title_full Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
title_fullStr Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
title_full_unstemmed Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
title_short Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
title_sort using false discovery rates to benchmark snp-callers in next-generation sequencing projects
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3604800/
https://www.ncbi.nlm.nih.gov/pubmed/23518929
http://dx.doi.org/10.1038/srep01512
work_keys_str_mv AT farrerrhysa usingfalsediscoveryratestobenchmarksnpcallersinnextgenerationsequencingprojects
AT henkdaniela usingfalsediscoveryratestobenchmarksnpcallersinnextgenerationsequencingprojects
AT macleandan usingfalsediscoveryratestobenchmarksnpcallersinnextgenerationsequencingprojects
AT studholmedavidj usingfalsediscoveryratestobenchmarksnpcallersinnextgenerationsequencingprojects
AT fishermatthewc usingfalsediscoveryratestobenchmarksnpcallersinnextgenerationsequencingprojects