Cargando…

SNP markers retrieval for a non-model species: a practical approach

BACKGROUND: SNP (Single Nucleotide Polymorphism) markers are rapidly becoming the markers of choice for applications in breeding because of next generation sequencing technology developments. For SNP development by NGS technologies, correct assembly of the huge amounts of sequence data generated is...

Descripción completa

Detalles Bibliográficos
Autores principales: Shahin, Arwa, van Gurp, Thomas, Peters, Sander A, Visser, Richard GF, van Tuyl, Jaap M, Arens, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298514/
https://www.ncbi.nlm.nih.gov/pubmed/22284269
http://dx.doi.org/10.1186/1756-0500-5-79
_version_ 1782226010840760320
author Shahin, Arwa
van Gurp, Thomas
Peters, Sander A
Visser, Richard GF
van Tuyl, Jaap M
Arens, Paul
author_facet Shahin, Arwa
van Gurp, Thomas
Peters, Sander A
Visser, Richard GF
van Tuyl, Jaap M
Arens, Paul
author_sort Shahin, Arwa
collection PubMed
description BACKGROUND: SNP (Single Nucleotide Polymorphism) markers are rapidly becoming the markers of choice for applications in breeding because of next generation sequencing technology developments. For SNP development by NGS technologies, correct assembly of the huge amounts of sequence data generated is essential. Little is known about assembler's performance, especially when dealing with highly heterogeneous species that show a high genome complexity and what the possible consequences are of differences in assemblies on SNP retrieval. This study tested two assemblers (CAP3 and CLC) on 454 data from four lily genotypes and compared results with respect to SNP retrieval. RESULTS: CAP3 assembly resulted in higher numbers of contigs, lower numbers of reads per contig, and shorter average read lengths compared to CLC. Blast comparisons showed that CAP3 contigs were highly redundant. Contrastingly, CLC in rare cases combined paralogs in one contig. Redundant and chimeric contigs may lead to erroneous SNPs. Filtering for redundancy can be done by blasting selected SNP markers to the contigs and discarding all the SNP markers that show more than one blast hit. Results on chimeric contigs showed that only four out of 2,421 SNP markers were selected from chimeric contigs. CONCLUSION: In practice, CLC performs better in assembling highly heterogeneous genome sequences compared to CAP3, and consequently SNP retrieval is more efficient. Additionally a simple flow scheme is suggested for SNP marker retrieval that can be valid for all non-model species.
format Online
Article
Text
id pubmed-3298514
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32985142012-03-10 SNP markers retrieval for a non-model species: a practical approach Shahin, Arwa van Gurp, Thomas Peters, Sander A Visser, Richard GF van Tuyl, Jaap M Arens, Paul BMC Res Notes Research Article BACKGROUND: SNP (Single Nucleotide Polymorphism) markers are rapidly becoming the markers of choice for applications in breeding because of next generation sequencing technology developments. For SNP development by NGS technologies, correct assembly of the huge amounts of sequence data generated is essential. Little is known about assembler's performance, especially when dealing with highly heterogeneous species that show a high genome complexity and what the possible consequences are of differences in assemblies on SNP retrieval. This study tested two assemblers (CAP3 and CLC) on 454 data from four lily genotypes and compared results with respect to SNP retrieval. RESULTS: CAP3 assembly resulted in higher numbers of contigs, lower numbers of reads per contig, and shorter average read lengths compared to CLC. Blast comparisons showed that CAP3 contigs were highly redundant. Contrastingly, CLC in rare cases combined paralogs in one contig. Redundant and chimeric contigs may lead to erroneous SNPs. Filtering for redundancy can be done by blasting selected SNP markers to the contigs and discarding all the SNP markers that show more than one blast hit. Results on chimeric contigs showed that only four out of 2,421 SNP markers were selected from chimeric contigs. CONCLUSION: In practice, CLC performs better in assembling highly heterogeneous genome sequences compared to CAP3, and consequently SNP retrieval is more efficient. Additionally a simple flow scheme is suggested for SNP marker retrieval that can be valid for all non-model species. BioMed Central 2012-01-29 /pmc/articles/PMC3298514/ /pubmed/22284269 http://dx.doi.org/10.1186/1756-0500-5-79 Text en Copyright ©2012 Shahin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Shahin, Arwa
van Gurp, Thomas
Peters, Sander A
Visser, Richard GF
van Tuyl, Jaap M
Arens, Paul
SNP markers retrieval for a non-model species: a practical approach
title SNP markers retrieval for a non-model species: a practical approach
title_full SNP markers retrieval for a non-model species: a practical approach
title_fullStr SNP markers retrieval for a non-model species: a practical approach
title_full_unstemmed SNP markers retrieval for a non-model species: a practical approach
title_short SNP markers retrieval for a non-model species: a practical approach
title_sort snp markers retrieval for a non-model species: a practical approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298514/
https://www.ncbi.nlm.nih.gov/pubmed/22284269
http://dx.doi.org/10.1186/1756-0500-5-79
work_keys_str_mv AT shahinarwa snpmarkersretrievalforanonmodelspeciesapracticalapproach
AT vangurpthomas snpmarkersretrievalforanonmodelspeciesapracticalapproach
AT peterssandera snpmarkersretrievalforanonmodelspeciesapracticalapproach
AT visserrichardgf snpmarkersretrievalforanonmodelspeciesapracticalapproach
AT vantuyljaapm snpmarkersretrievalforanonmodelspeciesapracticalapproach
AT arenspaul snpmarkersretrievalforanonmodelspeciesapracticalapproach