Cargando…

An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing

BACKGROUND: Multi-sample comparison is commonly used in cancer genomics studies. By using next-generation sequencing (NGS), a mutation's status in a specific sample can be measured by the number of reads supporting mutant or wildtype alleles. When no mutant reads are detected, it could represen...

Descripción completa

Detalles Bibliográficos
Autores principales: Hutson, Nicholas, Zhan, Fenglin, Graham, James, Murakami, Mitsuko, Zhang, Han, Ganaparti, Sujana, Hu, Qiang, Yan, Li, Ma, Changxing, Liu, Song, Xie, Jun, Wei, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8638096/
https://www.ncbi.nlm.nih.gov/pubmed/34856988
http://dx.doi.org/10.1186/s12920-021-00880-8
_version_ 1784608883598163968
author Hutson, Nicholas
Zhan, Fenglin
Graham, James
Murakami, Mitsuko
Zhang, Han
Ganaparti, Sujana
Hu, Qiang
Yan, Li
Ma, Changxing
Liu, Song
Xie, Jun
Wei, Lei
author_facet Hutson, Nicholas
Zhan, Fenglin
Graham, James
Murakami, Mitsuko
Zhang, Han
Ganaparti, Sujana
Hu, Qiang
Yan, Li
Ma, Changxing
Liu, Song
Xie, Jun
Wei, Lei
author_sort Hutson, Nicholas
collection PubMed
description BACKGROUND: Multi-sample comparison is commonly used in cancer genomics studies. By using next-generation sequencing (NGS), a mutation's status in a specific sample can be measured by the number of reads supporting mutant or wildtype alleles. When no mutant reads are detected, it could represent either a true negative mutation status or a false negative due to an insufficient number of reads, so-called "coverage". To minimize the chance of false-negative, we should consider the mutation status as "unknown" instead of "negative" when the coverage is inadequately low. There is no established method for determining the coverage threshold between negative and unknown statuses. A common solution is to apply a universal minimum coverage (UMC). However, this method relies on an arbitrarily chosen threshold, and it does not take into account the mutations' relative abundances, which can vary dramatically by the type of mutations. The result could be misclassification between negative and unknown statuses. METHODS: We propose an adaptive mutation-specific negative (MSN) method to improve the discrimination between negative and unknown mutation statuses. For a specific mutation, a non-positive sample is compared with every known positive sample to test the null hypothesis that they may contain the same frequency of mutant reads. The non-positive sample can only be claimed as “negative” when this null hypothesis is rejected with all known positive samples; otherwise, the status would be “unknown”. RESULTS: We first compared the performance of MSN and UMC methods in a simulated dataset containing varying tumor cell fractions. Only the MSN methods appropriately assigned negative statuses for samples with both high- and low-tumor cell fractions. When evaluated on a real dual-platform single-cell sequencing dataset, the MSN method not only provided more accurate assessments of negative statuses but also yielded three times more available data after excluding the “unknown” statuses, compared with the UMC method. CONCLUSIONS: We developed a new adaptive method for distinguishing unknown from negative statuses in multi-sample comparison NGS data. The method can provide more accurate negative statuses than the conventional UMC method and generate a remarkably higher amount of available data by reducing unnecessary “unknown” calls.
format Online
Article
Text
id pubmed-8638096
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86380962021-12-02 An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing Hutson, Nicholas Zhan, Fenglin Graham, James Murakami, Mitsuko Zhang, Han Ganaparti, Sujana Hu, Qiang Yan, Li Ma, Changxing Liu, Song Xie, Jun Wei, Lei BMC Med Genomics Software BACKGROUND: Multi-sample comparison is commonly used in cancer genomics studies. By using next-generation sequencing (NGS), a mutation's status in a specific sample can be measured by the number of reads supporting mutant or wildtype alleles. When no mutant reads are detected, it could represent either a true negative mutation status or a false negative due to an insufficient number of reads, so-called "coverage". To minimize the chance of false-negative, we should consider the mutation status as "unknown" instead of "negative" when the coverage is inadequately low. There is no established method for determining the coverage threshold between negative and unknown statuses. A common solution is to apply a universal minimum coverage (UMC). However, this method relies on an arbitrarily chosen threshold, and it does not take into account the mutations' relative abundances, which can vary dramatically by the type of mutations. The result could be misclassification between negative and unknown statuses. METHODS: We propose an adaptive mutation-specific negative (MSN) method to improve the discrimination between negative and unknown mutation statuses. For a specific mutation, a non-positive sample is compared with every known positive sample to test the null hypothesis that they may contain the same frequency of mutant reads. The non-positive sample can only be claimed as “negative” when this null hypothesis is rejected with all known positive samples; otherwise, the status would be “unknown”. RESULTS: We first compared the performance of MSN and UMC methods in a simulated dataset containing varying tumor cell fractions. Only the MSN methods appropriately assigned negative statuses for samples with both high- and low-tumor cell fractions. When evaluated on a real dual-platform single-cell sequencing dataset, the MSN method not only provided more accurate assessments of negative statuses but also yielded three times more available data after excluding the “unknown” statuses, compared with the UMC method. CONCLUSIONS: We developed a new adaptive method for distinguishing unknown from negative statuses in multi-sample comparison NGS data. The method can provide more accurate negative statuses than the conventional UMC method and generate a remarkably higher amount of available data by reducing unnecessary “unknown” calls. BioMed Central 2021-12-02 /pmc/articles/PMC8638096/ /pubmed/34856988 http://dx.doi.org/10.1186/s12920-021-00880-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Hutson, Nicholas
Zhan, Fenglin
Graham, James
Murakami, Mitsuko
Zhang, Han
Ganaparti, Sujana
Hu, Qiang
Yan, Li
Ma, Changxing
Liu, Song
Xie, Jun
Wei, Lei
An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
title An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
title_full An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
title_fullStr An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
title_full_unstemmed An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
title_short An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
title_sort adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8638096/
https://www.ncbi.nlm.nih.gov/pubmed/34856988
http://dx.doi.org/10.1186/s12920-021-00880-8
work_keys_str_mv AT hutsonnicholas anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT zhanfenglin anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT grahamjames anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT murakamimitsuko anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT zhanghan anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT ganapartisujana anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT huqiang anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT yanli anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT machangxing anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT liusong anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT xiejun anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT weilei anadaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT hutsonnicholas adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT zhanfenglin adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT grahamjames adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT murakamimitsuko adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT zhanghan adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT ganapartisujana adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT huqiang adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT yanli adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT machangxing adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT liusong adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT xiejun adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing
AT weilei adaptivemethodofdefiningnegativemutationstatusformultisamplecomparisonusingnextgenerationsequencing