Cargando…
Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousan...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886675/ https://www.ncbi.nlm.nih.gov/pubmed/33613609 http://dx.doi.org/10.3389/fpls.2021.628421 |
_version_ | 1783651845833490432 |
---|---|
author | Zhang, Wenchao Kang, Yun Cheng, Xiaofei Wen, Jiangqi Zhang, Hongying Torres-Jerez, Ivone Krom, Nick Udvardi, Michael K. Scheible, Wolf-Rüdiger Zhao, Patrick Xuechun |
author_facet | Zhang, Wenchao Kang, Yun Cheng, Xiaofei Wen, Jiangqi Zhang, Hongying Torres-Jerez, Ivone Krom, Nick Udvardi, Michael K. Scheible, Wolf-Rüdiger Zhao, Patrick Xuechun |
author_sort | Zhang, Wenchao |
collection | PubMed |
description | The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession’s specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the “set-partitioning” concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C(++) in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/. |
format | Online Article Text |
id | pubmed-7886675 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78866752021-02-18 Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees Zhang, Wenchao Kang, Yun Cheng, Xiaofei Wen, Jiangqi Zhang, Hongying Torres-Jerez, Ivone Krom, Nick Udvardi, Michael K. Scheible, Wolf-Rüdiger Zhao, Patrick Xuechun Front Plant Sci Plant Science The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession’s specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the “set-partitioning” concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C(++) in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/. Frontiers Media S.A. 2021-02-03 /pmc/articles/PMC7886675/ /pubmed/33613609 http://dx.doi.org/10.3389/fpls.2021.628421 Text en Copyright © 2021 Zhang, Kang, Cheng, Wen, Zhang, Torres-Jerez, Krom, Udvardi, Scheible and Zhao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Zhang, Wenchao Kang, Yun Cheng, Xiaofei Wen, Jiangqi Zhang, Hongying Torres-Jerez, Ivone Krom, Nick Udvardi, Michael K. Scheible, Wolf-Rüdiger Zhao, Patrick Xuechun Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees |
title | Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees |
title_full | Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees |
title_fullStr | Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees |
title_full_unstemmed | Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees |
title_short | Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees |
title_sort | distinguishing hapmap accessions through recursive set partitioning in hierarchical decision trees |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886675/ https://www.ncbi.nlm.nih.gov/pubmed/33613609 http://dx.doi.org/10.3389/fpls.2021.628421 |
work_keys_str_mv | AT zhangwenchao distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT kangyun distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT chengxiaofei distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT wenjiangqi distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT zhanghongying distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT torresjerezivone distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT kromnick distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT udvardimichaelk distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT scheiblewolfrudiger distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees AT zhaopatrickxuechun distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees |