Cargando…

seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data

In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affec...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Sihan, Zeng, Yuanyuan, Wang, Chao, Zhang, Qian, Chen, Meilin, Wang, Xiaolu, Wang, Lanchen, Lu, Yu, Guo, Hui, Bu, Fengxiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8930203/
https://www.ncbi.nlm.nih.gov/pubmed/35309142
http://dx.doi.org/10.3389/fgene.2022.850804
_version_ 1784671010597896192
author Liu, Sihan
Zeng, Yuanyuan
Wang, Chao
Zhang, Qian
Chen, Meilin
Wang, Xiaolu
Wang, Lanchen
Lu, Yu
Guo, Hui
Bu, Fengxiao
author_facet Liu, Sihan
Zeng, Yuanyuan
Wang, Chao
Zhang, Qian
Chen, Meilin
Wang, Xiaolu
Wang, Lanchen
Lu, Yu
Guo, Hui
Bu, Fengxiao
author_sort Liu, Sihan
collection PubMed
description In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated >99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy.
format Online
Article
Text
id pubmed-8930203
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-89302032022-03-18 seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data Liu, Sihan Zeng, Yuanyuan Wang, Chao Zhang, Qian Chen, Meilin Wang, Xiaolu Wang, Lanchen Lu, Yu Guo, Hui Bu, Fengxiao Front Genet Genetics In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated >99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy. Frontiers Media S.A. 2022-03-03 /pmc/articles/PMC8930203/ /pubmed/35309142 http://dx.doi.org/10.3389/fgene.2022.850804 Text en Copyright © 2022 Liu, Zeng, Wang, Zhang, Chen, Wang, Wang, Lu, Guo and Bu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Liu, Sihan
Zeng, Yuanyuan
Wang, Chao
Zhang, Qian
Chen, Meilin
Wang, Xiaolu
Wang, Lanchen
Lu, Yu
Guo, Hui
Bu, Fengxiao
seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data
title seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data
title_full seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data
title_fullStr seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data
title_full_unstemmed seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data
title_short seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data
title_sort segmm: a new tool for gender determination from massively parallel sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8930203/
https://www.ncbi.nlm.nih.gov/pubmed/35309142
http://dx.doi.org/10.3389/fgene.2022.850804
work_keys_str_mv AT liusihan segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT zengyuanyuan segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT wangchao segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT zhangqian segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT chenmeilin segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT wangxiaolu segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT wanglanchen segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT luyu segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT guohui segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata
AT bufengxiao segmmanewtoolforgenderdeterminationfrommassivelyparallelsequencingdata