Cargando…

Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Navarro, Javier, Nevado, Bruno, Hernández, Porfidio, Vera, Gonzalo, Ramos-Onsins, Sebastián E
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2017
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5582667/ https://www.ncbi.nlm.nih.gov/pubmed/28894353 http://dx.doi.org/10.1177/1176934317723884

_version_	1783261222567673856
author	Navarro, Javier Nevado, Bruno Hernández, Porfidio Vera, Gonzalo Ramos-Onsins, Sebastián E
author_facet	Navarro, Javier Nevado, Bruno Hernández, Porfidio Vera, Gonzalo Ramos-Onsins, Sebastián E
author_sort	Navarro, Javier
collection	PubMed
description	The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data.
format	Online Article Text
id	pubmed-5582667
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-55826672017-09-11 Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis Navarro, Javier Nevado, Bruno Hernández, Porfidio Vera, Gonzalo Ramos-Onsins, Sebastián E Evol Bioinform Online Original Research The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data. SAGE Publications 2017-08-24 /pmc/articles/PMC5582667/ /pubmed/28894353 http://dx.doi.org/10.1177/1176934317723884 Text en © The Author(s) 2017 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Research Navarro, Javier Nevado, Bruno Hernández, Porfidio Vera, Gonzalo Ramos-Onsins, Sebastián E Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_full	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_fullStr	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_full_unstemmed	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_short	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_sort	optimized next-generation sequencing genotype-haplotype calling for genome variability analysis
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5582667/ https://www.ncbi.nlm.nih.gov/pubmed/28894353 http://dx.doi.org/10.1177/1176934317723884
work_keys_str_mv	AT navarrojavier optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT nevadobruno optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT hernandezporfidio optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT veragonzalo optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT ramosonsinssebastiane optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis

Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

Ejemplares similares