Cargando…

PSR: polymorphic SSR retrieval

BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cantarella, Concita, D’Agostino, Nunzio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4591729/ https://www.ncbi.nlm.nih.gov/pubmed/26428628 http://dx.doi.org/10.1186/s13104-015-1474-4

_version_	1782393128762736640
author	Cantarella, Concita D’Agostino, Nunzio
author_facet	Cantarella, Concita D’Agostino, Nunzio
author_sort	Cantarella, Concita
collection	PubMed
description	BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes. RESULTS: In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation. CONCLUSION: PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.
format	Online Article Text
id	pubmed-4591729
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45917292015-10-03 PSR: polymorphic SSR retrieval Cantarella, Concita D’Agostino, Nunzio BMC Res Notes Technical Note BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes. RESULTS: In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation. CONCLUSION: PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era. BioMed Central 2015-10-01 /pmc/articles/PMC4591729/ /pubmed/26428628 http://dx.doi.org/10.1186/s13104-015-1474-4 Text en © Cantarella and D'Agostino. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Note Cantarella, Concita D’Agostino, Nunzio PSR: polymorphic SSR retrieval
title	PSR: polymorphic SSR retrieval
title_full	PSR: polymorphic SSR retrieval
title_fullStr	PSR: polymorphic SSR retrieval
title_full_unstemmed	PSR: polymorphic SSR retrieval
title_short	PSR: polymorphic SSR retrieval
title_sort	psr: polymorphic ssr retrieval
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4591729/ https://www.ncbi.nlm.nih.gov/pubmed/26428628 http://dx.doi.org/10.1186/s13104-015-1474-4
work_keys_str_mv	AT cantarellaconcita psrpolymorphicssrretrieval AT dagostinonunzio psrpolymorphicssrretrieval

PSR: polymorphic SSR retrieval

Ejemplares similares