Cargando…

4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population geno...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pina-Martins, Francisco, Vieira, Bruno M., Seabra, Sofia G., Batista, Dora, Paulo, Octávio S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719533/ https://www.ncbi.nlm.nih.gov/pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1

_version_	1782410948558979072
author	Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S.
author_facet	Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S.
author_sort	Pina-Martins, Francisco
collection	PubMed
description	BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. RESULTS: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. CONCLUSIONS: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4.
format	Online Article Text
id	pubmed-4719533
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47195332016-01-21 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. BMC Bioinformatics Software BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. RESULTS: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. CONCLUSIONS: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4. BioMed Central 2016-01-19 /pmc/articles/PMC4719533/ /pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1 Text en © Pina-Martins et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title	4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_full	4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_fullStr	4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_full_unstemmed	4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_short	4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
title_sort	4pipe4 – a 454 data analysis pipeline for snp detection in datasets with no reference sequence or strain information
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719533/ https://www.ncbi.nlm.nih.gov/pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1
work_keys_str_mv	AT pinamartinsfrancisco 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT vieirabrunom 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT seabrasofiag 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT batistadora 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT paulooctavios 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation

4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

Ejemplares similares