Cargando…
Benchmarking of computational error-correction methods for next-generation sequencing data
BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the app...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079412/ https://www.ncbi.nlm.nih.gov/pubmed/32183840 http://dx.doi.org/10.1186/s13059-020-01988-3 |
_version_ | 1783507817641017344 |
---|---|
author | Mitchell, Keith Brito, Jaqueline J. Mandric, Igor Wu, Qiaozhen Knyazev, Sergey Chang, Sei Martin, Lana S. Karlsberg, Aaron Gerasimov, Ekaterina Littman, Russell Hill, Brian L. Wu, Nicholas C. Yang, Harry Taegyun Hsieh, Kevin Chen, Linus Littman, Eli Shabani, Taylor Enik, German Yao, Douglas Sun, Ren Schroeder, Jan Eskin, Eleazar Zelikovsky, Alex Skums, Pavel Pop, Mihai Mangul, Serghei |
author_facet | Mitchell, Keith Brito, Jaqueline J. Mandric, Igor Wu, Qiaozhen Knyazev, Sergey Chang, Sei Martin, Lana S. Karlsberg, Aaron Gerasimov, Ekaterina Littman, Russell Hill, Brian L. Wu, Nicholas C. Yang, Harry Taegyun Hsieh, Kevin Chen, Linus Littman, Eli Shabani, Taylor Enik, German Yao, Douglas Sun, Ren Schroeder, Jan Eskin, Eleazar Zelikovsky, Alex Skums, Pavel Pop, Mihai Mangul, Serghei |
author_sort | Mitchell, Keith |
collection | PubMed |
description | BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity. |
format | Online Article Text |
id | pubmed-7079412 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70794122020-03-23 Benchmarking of computational error-correction methods for next-generation sequencing data Mitchell, Keith Brito, Jaqueline J. Mandric, Igor Wu, Qiaozhen Knyazev, Sergey Chang, Sei Martin, Lana S. Karlsberg, Aaron Gerasimov, Ekaterina Littman, Russell Hill, Brian L. Wu, Nicholas C. Yang, Harry Taegyun Hsieh, Kevin Chen, Linus Littman, Eli Shabani, Taylor Enik, German Yao, Douglas Sun, Ren Schroeder, Jan Eskin, Eleazar Zelikovsky, Alex Skums, Pavel Pop, Mihai Mangul, Serghei Genome Biol Research BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity. BioMed Central 2020-03-17 /pmc/articles/PMC7079412/ /pubmed/32183840 http://dx.doi.org/10.1186/s13059-020-01988-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Mitchell, Keith Brito, Jaqueline J. Mandric, Igor Wu, Qiaozhen Knyazev, Sergey Chang, Sei Martin, Lana S. Karlsberg, Aaron Gerasimov, Ekaterina Littman, Russell Hill, Brian L. Wu, Nicholas C. Yang, Harry Taegyun Hsieh, Kevin Chen, Linus Littman, Eli Shabani, Taylor Enik, German Yao, Douglas Sun, Ren Schroeder, Jan Eskin, Eleazar Zelikovsky, Alex Skums, Pavel Pop, Mihai Mangul, Serghei Benchmarking of computational error-correction methods for next-generation sequencing data |
title | Benchmarking of computational error-correction methods for next-generation sequencing data |
title_full | Benchmarking of computational error-correction methods for next-generation sequencing data |
title_fullStr | Benchmarking of computational error-correction methods for next-generation sequencing data |
title_full_unstemmed | Benchmarking of computational error-correction methods for next-generation sequencing data |
title_short | Benchmarking of computational error-correction methods for next-generation sequencing data |
title_sort | benchmarking of computational error-correction methods for next-generation sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079412/ https://www.ncbi.nlm.nih.gov/pubmed/32183840 http://dx.doi.org/10.1186/s13059-020-01988-3 |
work_keys_str_mv | AT mitchellkeith benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT britojaquelinej benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT mandricigor benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT wuqiaozhen benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT knyazevsergey benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT changsei benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT martinlanas benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT karlsbergaaron benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT gerasimovekaterina benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT littmanrussell benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT hillbrianl benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT wunicholasc benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT yangharrytaegyun benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT hsiehkevin benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT chenlinus benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT littmaneli benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT shabanitaylor benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT enikgerman benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT yaodouglas benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT sunren benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT schroederjan benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT eskineleazar benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT zelikovskyalex benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT skumspavel benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT popmihai benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata AT mangulserghei benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata |