Cargando…
Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is impo...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680274/ https://www.ncbi.nlm.nih.gov/pubmed/38012772 http://dx.doi.org/10.1186/s13059-023-03109-2 |
_version_ | 1785150692222042112 |
---|---|
author | Ren, Luyao Duan, Xiaoke Dong, Lianhua Zhang, Rui Yang, Jingcheng Gao, Yuechen Peng, Rongxue Hou, Wanwan Liu, Yaqing Li, Jingjing Yu, Ying Zhang, Naixin Shang, Jun Liang, Fan Wang, Depeng Chen, Hui Sun, Lele Hao, Lingtong Scherer, Andreas Nordlund, Jessica Xiao, Wenming Xu, Joshua Tong, Weida Hu, Xin Jia, Peng Ye, Kai Li, Jinming Jin, Li Hong, Huixiao Wang, Jing Fan, Shaohua Fang, Xiang Zheng, Yuanting Shi, Leming |
author_facet | Ren, Luyao Duan, Xiaoke Dong, Lianhua Zhang, Rui Yang, Jingcheng Gao, Yuechen Peng, Rongxue Hou, Wanwan Liu, Yaqing Li, Jingjing Yu, Ying Zhang, Naixin Shang, Jun Liang, Fan Wang, Depeng Chen, Hui Sun, Lele Hao, Lingtong Scherer, Andreas Nordlund, Jessica Xiao, Wenming Xu, Joshua Tong, Weida Hu, Xin Jia, Peng Ye, Kai Li, Jinming Jin, Li Hong, Huixiao Wang, Jing Fan, Shaohua Fang, Xiang Zheng, Yuanting Shi, Leming |
author_sort | Ren, Luyao |
collection | PubMed |
description | BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03109-2. |
format | Online Article Text |
id | pubmed-10680274 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106802742023-11-27 Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance Ren, Luyao Duan, Xiaoke Dong, Lianhua Zhang, Rui Yang, Jingcheng Gao, Yuechen Peng, Rongxue Hou, Wanwan Liu, Yaqing Li, Jingjing Yu, Ying Zhang, Naixin Shang, Jun Liang, Fan Wang, Depeng Chen, Hui Sun, Lele Hao, Lingtong Scherer, Andreas Nordlund, Jessica Xiao, Wenming Xu, Joshua Tong, Weida Hu, Xin Jia, Peng Ye, Kai Li, Jinming Jin, Li Hong, Huixiao Wang, Jing Fan, Shaohua Fang, Xiang Zheng, Yuanting Shi, Leming Genome Biol Research BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03109-2. BioMed Central 2023-11-27 /pmc/articles/PMC10680274/ /pubmed/38012772 http://dx.doi.org/10.1186/s13059-023-03109-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ren, Luyao Duan, Xiaoke Dong, Lianhua Zhang, Rui Yang, Jingcheng Gao, Yuechen Peng, Rongxue Hou, Wanwan Liu, Yaqing Li, Jingjing Yu, Ying Zhang, Naixin Shang, Jun Liang, Fan Wang, Depeng Chen, Hui Sun, Lele Hao, Lingtong Scherer, Andreas Nordlund, Jessica Xiao, Wenming Xu, Joshua Tong, Weida Hu, Xin Jia, Peng Ye, Kai Li, Jinming Jin, Li Hong, Huixiao Wang, Jing Fan, Shaohua Fang, Xiang Zheng, Yuanting Shi, Leming Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance |
title | Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance |
title_full | Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance |
title_fullStr | Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance |
title_full_unstemmed | Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance |
title_short | Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance |
title_sort | quartet dna reference materials and datasets for comprehensively evaluating germline variant calling performance |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680274/ https://www.ncbi.nlm.nih.gov/pubmed/38012772 http://dx.doi.org/10.1186/s13059-023-03109-2 |
work_keys_str_mv | AT renluyao quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT duanxiaoke quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT donglianhua quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT zhangrui quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT yangjingcheng quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT gaoyuechen quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT pengrongxue quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT houwanwan quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT liuyaqing quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT lijingjing quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT yuying quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT zhangnaixin quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT shangjun quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT liangfan quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT wangdepeng quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT chenhui quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT sunlele quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT haolingtong quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT schererandreas quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT nordlundjessica quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT xiaowenming quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT xujoshua quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT tongweida quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT huxin quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT jiapeng quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT yekai quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT lijinming quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT jinli quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT honghuixiao quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT wangjing quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT fanshaohua quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT fangxiang quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT zhengyuanting quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance AT shileming quartetdnareferencematerialsanddatasetsforcomprehensivelyevaluatinggermlinevariantcallingperformance |