Cargando…

PSR: polymorphic SSR retrieval

BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by...

Descripción completa

Detalles Bibliográficos
Autores principales: Cantarella, Concita, D’Agostino, Nunzio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4591729/
https://www.ncbi.nlm.nih.gov/pubmed/26428628
http://dx.doi.org/10.1186/s13104-015-1474-4
_version_ 1782393128762736640
author Cantarella, Concita
D’Agostino, Nunzio
author_facet Cantarella, Concita
D’Agostino, Nunzio
author_sort Cantarella, Concita
collection PubMed
description BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes. RESULTS: In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation. CONCLUSION: PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.
format Online
Article
Text
id pubmed-4591729
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45917292015-10-03 PSR: polymorphic SSR retrieval Cantarella, Concita D’Agostino, Nunzio BMC Res Notes Technical Note BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes. RESULTS: In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation. CONCLUSION: PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era. BioMed Central 2015-10-01 /pmc/articles/PMC4591729/ /pubmed/26428628 http://dx.doi.org/10.1186/s13104-015-1474-4 Text en © Cantarella and D'Agostino. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Cantarella, Concita
D’Agostino, Nunzio
PSR: polymorphic SSR retrieval
title PSR: polymorphic SSR retrieval
title_full PSR: polymorphic SSR retrieval
title_fullStr PSR: polymorphic SSR retrieval
title_full_unstemmed PSR: polymorphic SSR retrieval
title_short PSR: polymorphic SSR retrieval
title_sort psr: polymorphic ssr retrieval
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4591729/
https://www.ncbi.nlm.nih.gov/pubmed/26428628
http://dx.doi.org/10.1186/s13104-015-1474-4
work_keys_str_mv AT cantarellaconcita psrpolymorphicssrretrieval
AT dagostinonunzio psrpolymorphicssrretrieval