Cargando…

4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Pina-Martins, Francisco, Vieira, Bruno M., Seabra, Sofia G., Batista, Dora, Paulo, Octávio S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719533/
https://www.ncbi.nlm.nih.gov/pubmed/26787189
http://dx.doi.org/10.1186/s12859-016-0892-1
_version_ 1782410948558979072
author Pina-Martins, Francisco
Vieira, Bruno M.
Seabra, Sofia G.
Batista, Dora
Paulo, Octávio S.
author_facet Pina-Martins, Francisco
Vieira, Bruno M.
Seabra, Sofia G.
Batista, Dora
Paulo, Octávio S.
author_sort Pina-Martins, Francisco
collection PubMed
description BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. RESULTS: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. CONCLUSIONS: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4.
format Online
Article
Text
id pubmed-4719533
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47195332016-01-21 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. BMC Bioinformatics Software BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. RESULTS: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. CONCLUSIONS: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4. BioMed Central 2016-01-19 /pmc/articles/PMC4719533/ /pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1 Text en © Pina-Martins et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Pina-Martins, Francisco
Vieira, Bruno M.
Seabra, Sofia G.
Batista, Dora
Paulo, Octávio S.
4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_full 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_fullStr 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_full_unstemmed 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_short 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_sort 4pipe4 – a 454 data analysis pipeline for snp detection in datasets with no reference sequence or strain information
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719533/
https://www.ncbi.nlm.nih.gov/pubmed/26787189
http://dx.doi.org/10.1186/s12859-016-0892-1
work_keys_str_mv AT pinamartinsfrancisco 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation
AT vieirabrunom 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation
AT seabrasofiag 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation
AT batistadora 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation
AT paulooctavios 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation