Cargando…

Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance

BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is impo...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Luyao, Duan, Xiaoke, Dong, Lianhua, Zhang, Rui, Yang, Jingcheng, Gao, Yuechen, Peng, Rongxue, Hou, Wanwan, Liu, Yaqing, Li, Jingjing, Yu, Ying, Zhang, Naixin, Shang, Jun, Liang, Fan, Wang, Depeng, Chen, Hui, Sun, Lele, Hao, Lingtong, Scherer, Andreas, Nordlund, Jessica, Xiao, Wenming, Xu, Joshua, Tong, Weida, Hu, Xin, Jia, Peng, Ye, Kai, Li, Jinming, Jin, Li, Hong, Huixiao, Wang, Jing, Fan, Shaohua, Fang, Xiang, Zheng, Yuanting, Shi, Leming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680274/
https://www.ncbi.nlm.nih.gov/pubmed/38012772
http://dx.doi.org/10.1186/s13059-023-03109-2
Descripción
Sumario:BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03109-2.