Cargando…
4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population geno...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719533/ https://www.ncbi.nlm.nih.gov/pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1 |
_version_ | 1782410948558979072 |
---|---|
author | Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. |
author_facet | Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. |
author_sort | Pina-Martins, Francisco |
collection | PubMed |
description | BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. RESULTS: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. CONCLUSIONS: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4. |
format | Online Article Text |
id | pubmed-4719533 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47195332016-01-21 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. BMC Bioinformatics Software BACKGROUND: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. RESULTS: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. CONCLUSIONS: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4. BioMed Central 2016-01-19 /pmc/articles/PMC4719533/ /pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1 Text en © Pina-Martins et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Pina-Martins, Francisco Vieira, Bruno M. Seabra, Sofia G. Batista, Dora Paulo, Octávio S. 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information |
title | 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information |
title_full | 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information |
title_fullStr | 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information |
title_full_unstemmed | 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information |
title_short | 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information |
title_sort | 4pipe4 – a 454 data analysis pipeline for snp detection in datasets with no reference sequence or strain information |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719533/ https://www.ncbi.nlm.nih.gov/pubmed/26787189 http://dx.doi.org/10.1186/s12859-016-0892-1 |
work_keys_str_mv | AT pinamartinsfrancisco 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT vieirabrunom 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT seabrasofiag 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT batistadora 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation AT paulooctavios 4pipe4a454dataanalysispipelineforsnpdetectionindatasetswithnoreferencesequenceorstraininformation |