Cargando…

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

BACKGROUND: Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tang, Jifeng, Vosman, Ben, Voorrips, Roeland E, van der Linden, C Gerard, Leunissen, Jack AM
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618865/ https://www.ncbi.nlm.nih.gov/pubmed/17029635 http://dx.doi.org/10.1186/1471-2105-7-438

_version_	1782130536221769728
author	Tang, Jifeng Vosman, Ben Voorrips, Roeland E van der Linden, C Gerard Leunissen, Jack AM
author_facet	Tang, Jifeng Vosman, Ben Voorrips, Roeland E van der Linden, C Gerard Leunissen, Jack AM
author_sort	Tang, Jifeng
collection	PubMed
description	BACKGROUND: Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. RESULTS: We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. CONCLUSION: QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at and as Additional files.
format	Text
id	pubmed-1618865
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16188652006-10-24 QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species Tang, Jifeng Vosman, Ben Voorrips, Roeland E van der Linden, C Gerard Leunissen, Jack AM BMC Bioinformatics Software BACKGROUND: Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. RESULTS: We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. CONCLUSION: QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at and as Additional files. BioMed Central 2006-10-09 /pmc/articles/PMC1618865/ /pubmed/17029635 http://dx.doi.org/10.1186/1471-2105-7-438 Text en Copyright © 2006 Tang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Tang, Jifeng Vosman, Ben Voorrips, Roeland E van der Linden, C Gerard Leunissen, Jack AM QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species
title	QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species
title_full	QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species
title_fullStr	QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species
title_full_unstemmed	QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species
title_short	QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species
title_sort	qualitysnp: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in est data from diploid and polyploid species
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618865/ https://www.ncbi.nlm.nih.gov/pubmed/17029635 http://dx.doi.org/10.1186/1471-2105-7-438
work_keys_str_mv	AT tangjifeng qualitysnpapipelinefordetectingsinglenucleotidepolymorphismsandinsertionsdeletionsinestdatafromdiploidandpolyploidspecies AT vosmanben qualitysnpapipelinefordetectingsinglenucleotidepolymorphismsandinsertionsdeletionsinestdatafromdiploidandpolyploidspecies AT voorripsroelande qualitysnpapipelinefordetectingsinglenucleotidepolymorphismsandinsertionsdeletionsinestdatafromdiploidandpolyploidspecies AT vanderlindencgerard qualitysnpapipelinefordetectingsinglenucleotidepolymorphismsandinsertionsdeletionsinestdatafromdiploidandpolyploidspecies AT leunissenjackam qualitysnpapipelinefordetectingsinglenucleotidepolymorphismsandinsertionsdeletionsinestdatafromdiploidandpolyploidspecies

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

Ejemplares similares