Cargando…

Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants

BACKGROUND: Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez, Lua, Barreiro, Rodolfo, Fischer, Markus, Koch, Marcus A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4603344/
https://www.ncbi.nlm.nih.gov/pubmed/26463180
http://dx.doi.org/10.1186/s12864-015-2031-1
_version_ 1782394900632829952
author Lopez, Lua
Barreiro, Rodolfo
Fischer, Markus
Koch, Marcus A.
author_facet Lopez, Lua
Barreiro, Rodolfo
Fischer, Markus
Koch, Marcus A.
author_sort Lopez, Lua
collection PubMed
description BACKGROUND: Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. METHODS: Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. RESULTS: We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. CONCLUSIONS: The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2031-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4603344
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46033442015-10-14 Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants Lopez, Lua Barreiro, Rodolfo Fischer, Markus Koch, Marcus A. BMC Genomics Research Article BACKGROUND: Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. METHODS: Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. RESULTS: We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. CONCLUSIONS: The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2031-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-13 /pmc/articles/PMC4603344/ /pubmed/26463180 http://dx.doi.org/10.1186/s12864-015-2031-1 Text en © Lopez et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Lopez, Lua
Barreiro, Rodolfo
Fischer, Markus
Koch, Marcus A.
Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
title Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
title_full Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
title_fullStr Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
title_full_unstemmed Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
title_short Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
title_sort mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4603344/
https://www.ncbi.nlm.nih.gov/pubmed/26463180
http://dx.doi.org/10.1186/s12864-015-2031-1
work_keys_str_mv AT lopezlua miningmicrosatellitemarkersfrompublicexpressedsequencetagsdatabasesforthestudyofthreatenedplants
AT barreirorodolfo miningmicrosatellitemarkersfrompublicexpressedsequencetagsdatabasesforthestudyofthreatenedplants
AT fischermarkus miningmicrosatellitemarkersfrompublicexpressedsequencetagsdatabasesforthestudyofthreatenedplants
AT kochmarcusa miningmicrosatellitemarkersfrompublicexpressedsequencetagsdatabasesforthestudyofthreatenedplants