Cargando…

CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences

Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mut...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, En-Hua, Yao, Qiu-Yang, Zhang, Hai-Bin, Jiang, Jian-Jun, Zhang, Li-Ping, Gao, Li-Zhi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703815/
https://www.ncbi.nlm.nih.gov/pubmed/26779212
http://dx.doi.org/10.3389/fpls.2015.01171
_version_ 1782408784267706368
author Xia, En-Hua
Yao, Qiu-Yang
Zhang, Hai-Bin
Jiang, Jian-Jun
Zhang, Li-Ping
Gao, Li-Zhi
author_facet Xia, En-Hua
Yao, Qiu-Yang
Zhang, Hai-Bin
Jiang, Jian-Jun
Zhang, Li-Ping
Gao, Li-Zhi
author_sort Xia, En-Hua
collection PubMed
description Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html.
format Online
Article
Text
id pubmed-4703815
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-47038152016-01-15 CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences Xia, En-Hua Yao, Qiu-Yang Zhang, Hai-Bin Jiang, Jian-Jun Zhang, Li-Ping Gao, Li-Zhi Front Plant Sci Plant Science Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html. Frontiers Media S.A. 2016-01-07 /pmc/articles/PMC4703815/ /pubmed/26779212 http://dx.doi.org/10.3389/fpls.2015.01171 Text en Copyright © 2016 Xia, Yao, Zhang, Jiang, Zhang and Gao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Xia, En-Hua
Yao, Qiu-Yang
Zhang, Hai-Bin
Jiang, Jian-Jun
Zhang, Li-Ping
Gao, Li-Zhi
CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences
title CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences
title_full CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences
title_fullStr CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences
title_full_unstemmed CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences
title_short CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences
title_sort candissr: an efficient pipeline used for identifying candidate polymorphic ssrs based on multiple assembled sequences
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703815/
https://www.ncbi.nlm.nih.gov/pubmed/26779212
http://dx.doi.org/10.3389/fpls.2015.01171
work_keys_str_mv AT xiaenhua candissranefficientpipelineusedforidentifyingcandidatepolymorphicssrsbasedonmultipleassembledsequences
AT yaoqiuyang candissranefficientpipelineusedforidentifyingcandidatepolymorphicssrsbasedonmultipleassembledsequences
AT zhanghaibin candissranefficientpipelineusedforidentifyingcandidatepolymorphicssrsbasedonmultipleassembledsequences
AT jiangjianjun candissranefficientpipelineusedforidentifyingcandidatepolymorphicssrsbasedonmultipleassembledsequences
AT zhangliping candissranefficientpipelineusedforidentifyingcandidatepolymorphicssrsbasedonmultipleassembledsequences
AT gaolizhi candissranefficientpipelineusedforidentifyingcandidatepolymorphicssrsbasedonmultipleassembledsequences