Cargando…

Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary infer...

Descripción completa

Detalles Bibliográficos
Autores principales: Machado, Moara, Magalhães, Wagner CS, Sene, Allan, Araújo, Bruno, Faria-Campos, Alessandra C, Chanock, Stephen J, Scott, Leandro, Oliveira, Guilherme, Tarazona-Santos, Eduardo, Rodrigues, Maira R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041995/
https://www.ncbi.nlm.nih.gov/pubmed/21284835
http://dx.doi.org/10.1186/2041-2223-2-3
_version_ 1782198500916723712
author Machado, Moara
Magalhães, Wagner CS
Sene, Allan
Araújo, Bruno
Faria-Campos, Alessandra C
Chanock, Stephen J
Scott, Leandro
Oliveira, Guilherme
Tarazona-Santos, Eduardo
Rodrigues, Maira R
author_facet Machado, Moara
Magalhães, Wagner CS
Sene, Allan
Araújo, Bruno
Faria-Campos, Alessandra C
Chanock, Stephen J
Scott, Leandro
Oliveira, Guilherme
Tarazona-Santos, Eduardo
Rodrigues, Maira R
author_sort Machado, Moara
collection PubMed
description BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.
format Text
id pubmed-3041995
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30419952011-02-20 Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies Machado, Moara Magalhães, Wagner CS Sene, Allan Araújo, Bruno Faria-Campos, Alessandra C Chanock, Stephen J Scott, Leandro Oliveira, Guilherme Tarazona-Santos, Eduardo Rodrigues, Maira R Investig Genet Research BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses. BioMed Central 2011-02-01 /pmc/articles/PMC3041995/ /pubmed/21284835 http://dx.doi.org/10.1186/2041-2223-2-3 Text en Copyright ©2011 Machado et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Machado, Moara
Magalhães, Wagner CS
Sene, Allan
Araújo, Bruno
Faria-Campos, Alessandra C
Chanock, Stephen J
Scott, Leandro
Oliveira, Guilherme
Tarazona-Santos, Eduardo
Rodrigues, Maira R
Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
title Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
title_full Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
title_fullStr Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
title_full_unstemmed Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
title_short Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
title_sort phred-phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041995/
https://www.ncbi.nlm.nih.gov/pubmed/21284835
http://dx.doi.org/10.1186/2041-2223-2-3
work_keys_str_mv AT machadomoara phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT magalhaeswagnercs phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT seneallan phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT araujobruno phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT fariacamposalessandrac phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT chanockstephenj phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT scottleandro phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT oliveiraguilherme phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT tarazonasantoseduardo phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies
AT rodriguesmairar phredphrappackagetoanalysestoolsapipelinetofacilitatepopulationgeneticsresequencingstudies