Cargando…

A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data

Simple sequence repeats (SSRs) are widely used genetic markers in ecology, evolution, and conservation even in the genomics era, while a general limitation to their application is the difficulty of developing polymorphic SSR markers. Next‐generation sequencing (NGS) offers the opportunity for the ra...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hui, Gao, Shenghan, Liu, Yu, Wang, Pengcheng, Zhang, Zhengwang, Chen, De
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928897/
https://www.ncbi.nlm.nih.gov/pubmed/35342577
http://dx.doi.org/10.1002/ece3.8705
_version_ 1784670738130665472
author Wang, Hui
Gao, Shenghan
Liu, Yu
Wang, Pengcheng
Zhang, Zhengwang
Chen, De
author_facet Wang, Hui
Gao, Shenghan
Liu, Yu
Wang, Pengcheng
Zhang, Zhengwang
Chen, De
author_sort Wang, Hui
collection PubMed
description Simple sequence repeats (SSRs) are widely used genetic markers in ecology, evolution, and conservation even in the genomics era, while a general limitation to their application is the difficulty of developing polymorphic SSR markers. Next‐generation sequencing (NGS) offers the opportunity for the rapid development of SSRs; however, previous studies developing SSRs using genomic data from only one individual need redundant experiments to test the polymorphisms of SSRs. In this study, we designed a pipeline for the rapid development of polymorphic SSR markers from multi‐sample genomic data. We used bioinformatic software to genotype multiple individuals using resequencing data, detected highly polymorphic SSRs prior to experimental validation, significantly improved the efficiency and reduced the experimental effort. The pipeline was successfully applied to a globally threatened species, the brown eared‐pheasant (Crossoptilon mantchuricum), which showed very low genomic diversity. The 20 newly developed SSR markers were highly polymorphic, the average number of alleles was much higher than the genomic average. We also evaluated the effect of the number of individuals and sequencing depth on the SSR mining results, and we found that 10 individuals and ~10X sequencing data were enough to obtain a sufficient number of polymorphic SSRs, even for species with low genetic diversity. Furthermore, the genome assembly of NGS data from the optimal number of individuals and sequencing depth can be used as an alternative reference genome if a high‐quality genome is not available. Our pipeline provided a paradigm for the application of NGS technology to mining and developing molecular markers for ecological and evolutionary studies.
format Online
Article
Text
id pubmed-8928897
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-89288972022-03-24 A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data Wang, Hui Gao, Shenghan Liu, Yu Wang, Pengcheng Zhang, Zhengwang Chen, De Ecol Evol Research Articles Simple sequence repeats (SSRs) are widely used genetic markers in ecology, evolution, and conservation even in the genomics era, while a general limitation to their application is the difficulty of developing polymorphic SSR markers. Next‐generation sequencing (NGS) offers the opportunity for the rapid development of SSRs; however, previous studies developing SSRs using genomic data from only one individual need redundant experiments to test the polymorphisms of SSRs. In this study, we designed a pipeline for the rapid development of polymorphic SSR markers from multi‐sample genomic data. We used bioinformatic software to genotype multiple individuals using resequencing data, detected highly polymorphic SSRs prior to experimental validation, significantly improved the efficiency and reduced the experimental effort. The pipeline was successfully applied to a globally threatened species, the brown eared‐pheasant (Crossoptilon mantchuricum), which showed very low genomic diversity. The 20 newly developed SSR markers were highly polymorphic, the average number of alleles was much higher than the genomic average. We also evaluated the effect of the number of individuals and sequencing depth on the SSR mining results, and we found that 10 individuals and ~10X sequencing data were enough to obtain a sufficient number of polymorphic SSRs, even for species with low genetic diversity. Furthermore, the genome assembly of NGS data from the optimal number of individuals and sequencing depth can be used as an alternative reference genome if a high‐quality genome is not available. Our pipeline provided a paradigm for the application of NGS technology to mining and developing molecular markers for ecological and evolutionary studies. John Wiley and Sons Inc. 2022-03-06 /pmc/articles/PMC8928897/ /pubmed/35342577 http://dx.doi.org/10.1002/ece3.8705 Text en © 2022 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Wang, Hui
Gao, Shenghan
Liu, Yu
Wang, Pengcheng
Zhang, Zhengwang
Chen, De
A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
title A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
title_full A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
title_fullStr A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
title_full_unstemmed A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
title_short A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
title_sort pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi‐sample genomic data
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928897/
https://www.ncbi.nlm.nih.gov/pubmed/35342577
http://dx.doi.org/10.1002/ece3.8705
work_keys_str_mv AT wanghui apipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT gaoshenghan apipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT liuyu apipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT wangpengcheng apipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT zhangzhengwang apipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT chende apipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT wanghui pipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT gaoshenghan pipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT liuyu pipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT wangpengcheng pipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT zhangzhengwang pipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata
AT chende pipelineforeffectivelydevelopinghighlypolymorphicsimplesequencerepeatsmarkersbasedonmultisamplegenomicdata