Cargando…

Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance

BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is impo...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Luyao, Duan, Xiaoke, Dong, Lianhua, Zhang, Rui, Yang, Jingcheng, Gao, Yuechen, Peng, Rongxue, Hou, Wanwan, Liu, Yaqing, Li, Jingjing, Yu, Ying, Zhang, Naixin, Shang, Jun, Liang, Fan, Wang, Depeng, Chen, Hui, Sun, Lele, Hao, Lingtong, Scherer, Andreas, Nordlund, Jessica, Xiao, Wenming, Xu, Joshua, Tong, Weida, Hu, Xin, Jia, Peng, Ye, Kai, Li, Jinming, Jin, Li, Hong, Huixiao, Wang, Jing, Fan, Shaohua, Fang, Xiang, Zheng, Yuanting, Shi, Leming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680274/
https://www.ncbi.nlm.nih.gov/pubmed/38012772
http://dx.doi.org/10.1186/s13059-023-03109-2
_version_ 1785150692222042112
author Ren, Luyao
Duan, Xiaoke
Dong, Lianhua
Zhang, Rui
Yang, Jingcheng
Gao, Yuechen
Peng, Rongxue
Hou, Wanwan
Liu, Yaqing
Li, Jingjing
Yu, Ying
Zhang, Naixin
Shang, Jun
Liang, Fan
Wang, Depeng
Chen, Hui
Sun, Lele
Hao, Lingtong
Scherer, Andreas
Nordlund, Jessica
Xiao, Wenming
Xu, Joshua
Tong, Weida
Hu, Xin
Jia, Peng
Ye, Kai
Li, Jinming
Jin, Li
Hong, Huixiao
Wang, Jing
Fan, Shaohua
Fang, Xiang
Zheng, Yuanting
Shi, Leming
author_facet Ren, Luyao
Duan, Xiaoke
Dong, Lianhua
Zhang, Rui
Yang, Jingcheng
Gao, Yuechen
Peng, Rongxue
Hou, Wanwan
Liu, Yaqing
Li, Jingjing
Yu, Ying
Zhang, Naixin
Shang, Jun
Liang, Fan
Wang, Depeng
Chen, Hui
Sun, Lele
Hao, Lingtong
Scherer, Andreas
Nordlund, Jessica
Xiao, Wenming
Xu, Joshua
Tong, Weida
Hu, Xin
Jia, Peng
Ye, Kai
Li, Jinming
Jin, Li
Hong, Huixiao
Wang, Jing
Fan, Shaohua
Fang, Xiang
Zheng, Yuanting
Shi, Leming
author_sort Ren, Luyao
collection PubMed
description BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03109-2.
format Online
Article
Text
id pubmed-10680274
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106802742023-11-27 Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance Ren, Luyao Duan, Xiaoke Dong, Lianhua Zhang, Rui Yang, Jingcheng Gao, Yuechen Peng, Rongxue Hou, Wanwan Liu, Yaqing Li, Jingjing Yu, Ying Zhang, Naixin Shang, Jun Liang, Fan Wang, Depeng Chen, Hui Sun, Lele Hao, Lingtong Scherer, Andreas Nordlund, Jessica Xiao, Wenming Xu, Joshua Tong, Weida Hu, Xin Jia, Peng Ye, Kai Li, Jinming Jin, Li Hong, Huixiao Wang, Jing Fan, Shaohua Fang, Xiang Zheng, Yuanting Shi, Leming Genome Biol Research BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03109-2. BioMed Central 2023-11-27 /pmc/articles/PMC10680274/ /pubmed/38012772 http://dx.doi.org/10.1186/s13059-023-03109-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ren, Luyao
Duan, Xiaoke
Dong, Lianhua
Zhang, Rui
Yang, Jingcheng
Gao, Yuechen
Peng, Rongxue
Hou, Wanwan
Liu, Yaqing
Li, Jingjing
Yu, Ying
Zhang, Naixin
Shang, Jun
Liang, Fan
Wang, Depeng
Chen, Hui
Sun, Lele
Hao, Lingtong
Scherer, Andreas
Nordlund, Jessica
Xiao, Wenming
Xu, Joshua
Tong, Weida
Hu, Xin
Jia, Peng
Ye, Kai
Li, Jinming
Jin, Li
Hong, Huixiao
Wang, Jing
Fan, Shaohua
Fang, Xiang
Zheng, Yuanting
Shi, Leming
Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
title Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
title_full Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
title_fullStr Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
title_full_unstemmed Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
title_short Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
title_sort quartet dna reference materials and datasets for comprehensively evaluating germline variant calling performance
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680274/
https://www.ncbi.nlm.nih.gov/pubmed/38012772
http://dx.doi.org/10.1186/s13059-023-03109-2
work_keys_str_mv AT renluyao quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT duanxiaoke quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT donglianhua quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT zhangrui quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT yangjingcheng quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT gaoyuechen quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT pengrongxue quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT houwanwan quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT liuyaqing quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT lijingjing quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT yuying quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT zhangnaixin quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT shangjun quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT liangfan quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT wangdepeng quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT chenhui quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT sunlele quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT haolingtong quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT schererandreas quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT nordlundjessica quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT xiaowenming quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT xujoshua quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT tongweida quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT huxin quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT jiapeng quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT yekai quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT lijinming quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT jinli quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT honghuixiao quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT wangjing quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT fanshaohua quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT fangxiang quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT zhengyuanting quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance
AT shileming quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance