Cargando…

Benchmarking of computational error-correction methods for next-generation sequencing data

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the app...

Descripción completa

Detalles Bibliográficos
Autores principales: Mitchell, Keith, Brito, Jaqueline J., Mandric, Igor, Wu, Qiaozhen, Knyazev, Sergey, Chang, Sei, Martin, Lana S., Karlsberg, Aaron, Gerasimov, Ekaterina, Littman, Russell, Hill, Brian L., Wu, Nicholas C., Yang, Harry Taegyun, Hsieh, Kevin, Chen, Linus, Littman, Eli, Shabani, Taylor, Enik, German, Yao, Douglas, Sun, Ren, Schroeder, Jan, Eskin, Eleazar, Zelikovsky, Alex, Skums, Pavel, Pop, Mihai, Mangul, Serghei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079412/
https://www.ncbi.nlm.nih.gov/pubmed/32183840
http://dx.doi.org/10.1186/s13059-020-01988-3
_version_ 1783507817641017344
author Mitchell, Keith
Brito, Jaqueline J.
Mandric, Igor
Wu, Qiaozhen
Knyazev, Sergey
Chang, Sei
Martin, Lana S.
Karlsberg, Aaron
Gerasimov, Ekaterina
Littman, Russell
Hill, Brian L.
Wu, Nicholas C.
Yang, Harry Taegyun
Hsieh, Kevin
Chen, Linus
Littman, Eli
Shabani, Taylor
Enik, German
Yao, Douglas
Sun, Ren
Schroeder, Jan
Eskin, Eleazar
Zelikovsky, Alex
Skums, Pavel
Pop, Mihai
Mangul, Serghei
author_facet Mitchell, Keith
Brito, Jaqueline J.
Mandric, Igor
Wu, Qiaozhen
Knyazev, Sergey
Chang, Sei
Martin, Lana S.
Karlsberg, Aaron
Gerasimov, Ekaterina
Littman, Russell
Hill, Brian L.
Wu, Nicholas C.
Yang, Harry Taegyun
Hsieh, Kevin
Chen, Linus
Littman, Eli
Shabani, Taylor
Enik, German
Yao, Douglas
Sun, Ren
Schroeder, Jan
Eskin, Eleazar
Zelikovsky, Alex
Skums, Pavel
Pop, Mihai
Mangul, Serghei
author_sort Mitchell, Keith
collection PubMed
description BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
format Online
Article
Text
id pubmed-7079412
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70794122020-03-23 Benchmarking of computational error-correction methods for next-generation sequencing data Mitchell, Keith Brito, Jaqueline J. Mandric, Igor Wu, Qiaozhen Knyazev, Sergey Chang, Sei Martin, Lana S. Karlsberg, Aaron Gerasimov, Ekaterina Littman, Russell Hill, Brian L. Wu, Nicholas C. Yang, Harry Taegyun Hsieh, Kevin Chen, Linus Littman, Eli Shabani, Taylor Enik, German Yao, Douglas Sun, Ren Schroeder, Jan Eskin, Eleazar Zelikovsky, Alex Skums, Pavel Pop, Mihai Mangul, Serghei Genome Biol Research BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity. BioMed Central 2020-03-17 /pmc/articles/PMC7079412/ /pubmed/32183840 http://dx.doi.org/10.1186/s13059-020-01988-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Mitchell, Keith
Brito, Jaqueline J.
Mandric, Igor
Wu, Qiaozhen
Knyazev, Sergey
Chang, Sei
Martin, Lana S.
Karlsberg, Aaron
Gerasimov, Ekaterina
Littman, Russell
Hill, Brian L.
Wu, Nicholas C.
Yang, Harry Taegyun
Hsieh, Kevin
Chen, Linus
Littman, Eli
Shabani, Taylor
Enik, German
Yao, Douglas
Sun, Ren
Schroeder, Jan
Eskin, Eleazar
Zelikovsky, Alex
Skums, Pavel
Pop, Mihai
Mangul, Serghei
Benchmarking of computational error-correction methods for next-generation sequencing data
title Benchmarking of computational error-correction methods for next-generation sequencing data
title_full Benchmarking of computational error-correction methods for next-generation sequencing data
title_fullStr Benchmarking of computational error-correction methods for next-generation sequencing data
title_full_unstemmed Benchmarking of computational error-correction methods for next-generation sequencing data
title_short Benchmarking of computational error-correction methods for next-generation sequencing data
title_sort benchmarking of computational error-correction methods for next-generation sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079412/
https://www.ncbi.nlm.nih.gov/pubmed/32183840
http://dx.doi.org/10.1186/s13059-020-01988-3
work_keys_str_mv AT mitchellkeith benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT britojaquelinej benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT mandricigor benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT wuqiaozhen benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT knyazevsergey benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT changsei benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT martinlanas benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT karlsbergaaron benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT gerasimovekaterina benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT littmanrussell benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT hillbrianl benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT wunicholasc benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT yangharrytaegyun benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT hsiehkevin benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT chenlinus benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT littmaneli benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT shabanitaylor benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT enikgerman benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT yaodouglas benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT sunren benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT schroederjan benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT eskineleazar benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT zelikovskyalex benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT skumspavel benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT popmihai benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata
AT mangulserghei benchmarkingofcomputationalerrorcorrectionmethodsfornextgenerationsequencingdata