Cargando…

Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees

The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousan...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wenchao, Kang, Yun, Cheng, Xiaofei, Wen, Jiangqi, Zhang, Hongying, Torres-Jerez, Ivone, Krom, Nick, Udvardi, Michael K., Scheible, Wolf-Rüdiger, Zhao, Patrick Xuechun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886675/
https://www.ncbi.nlm.nih.gov/pubmed/33613609
http://dx.doi.org/10.3389/fpls.2021.628421
_version_ 1783651845833490432
author Zhang, Wenchao
Kang, Yun
Cheng, Xiaofei
Wen, Jiangqi
Zhang, Hongying
Torres-Jerez, Ivone
Krom, Nick
Udvardi, Michael K.
Scheible, Wolf-Rüdiger
Zhao, Patrick Xuechun
author_facet Zhang, Wenchao
Kang, Yun
Cheng, Xiaofei
Wen, Jiangqi
Zhang, Hongying
Torres-Jerez, Ivone
Krom, Nick
Udvardi, Michael K.
Scheible, Wolf-Rüdiger
Zhao, Patrick Xuechun
author_sort Zhang, Wenchao
collection PubMed
description The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession’s specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the “set-partitioning” concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C(++) in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/.
format Online
Article
Text
id pubmed-7886675
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78866752021-02-18 Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees Zhang, Wenchao Kang, Yun Cheng, Xiaofei Wen, Jiangqi Zhang, Hongying Torres-Jerez, Ivone Krom, Nick Udvardi, Michael K. Scheible, Wolf-Rüdiger Zhao, Patrick Xuechun Front Plant Sci Plant Science The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession’s specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the “set-partitioning” concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C(++) in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/. Frontiers Media S.A. 2021-02-03 /pmc/articles/PMC7886675/ /pubmed/33613609 http://dx.doi.org/10.3389/fpls.2021.628421 Text en Copyright © 2021 Zhang, Kang, Cheng, Wen, Zhang, Torres-Jerez, Krom, Udvardi, Scheible and Zhao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Zhang, Wenchao
Kang, Yun
Cheng, Xiaofei
Wen, Jiangqi
Zhang, Hongying
Torres-Jerez, Ivone
Krom, Nick
Udvardi, Michael K.
Scheible, Wolf-Rüdiger
Zhao, Patrick Xuechun
Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
title Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
title_full Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
title_fullStr Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
title_full_unstemmed Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
title_short Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees
title_sort distinguishing hapmap accessions through recursive set partitioning in hierarchical decision trees
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886675/
https://www.ncbi.nlm.nih.gov/pubmed/33613609
http://dx.doi.org/10.3389/fpls.2021.628421
work_keys_str_mv AT zhangwenchao distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT kangyun distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT chengxiaofei distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT wenjiangqi distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT zhanghongying distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT torresjerezivone distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT kromnick distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT udvardimichaelk distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT scheiblewolfrudiger distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees
AT zhaopatrickxuechun distinguishinghapmapaccessionsthroughrecursivesetpartitioninginhierarchicaldecisiontrees