Cargando…
A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and p...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356754/ https://www.ncbi.nlm.nih.gov/pubmed/30646604 http://dx.doi.org/10.3390/genes10010044 |
_version_ | 1783391628479692800 |
---|---|
author | Zhang, Wenjing Huang, Neng Zheng, Jiantao Liao, Xingyu Wang, Jianxin Li, Hong-Dong |
author_facet | Zhang, Wenjing Huang, Neng Zheng, Jiantao Liao, Xingyu Wang, Jianxin Li, Hong-Dong |
author_sort | Zhang, Wenjing |
collection | PubMed |
description | The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms. |
format | Online Article Text |
id | pubmed-6356754 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-63567542019-02-04 A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads Zhang, Wenjing Huang, Neng Zheng, Jiantao Liao, Xingyu Wang, Jianxin Li, Hong-Dong Genes (Basel) Article The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms. MDPI 2019-01-14 /pmc/articles/PMC6356754/ /pubmed/30646604 http://dx.doi.org/10.3390/genes10010044 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Wenjing Huang, Neng Zheng, Jiantao Liao, Xingyu Wang, Jianxin Li, Hong-Dong A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads |
title | A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads |
title_full | A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads |
title_fullStr | A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads |
title_full_unstemmed | A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads |
title_short | A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads |
title_sort | sequence-based novel approach for quality evaluation of third-generation sequencing reads |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356754/ https://www.ncbi.nlm.nih.gov/pubmed/30646604 http://dx.doi.org/10.3390/genes10010044 |
work_keys_str_mv | AT zhangwenjing asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT huangneng asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT zhengjiantao asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT liaoxingyu asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT wangjianxin asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT lihongdong asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT zhangwenjing sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT huangneng sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT zhengjiantao sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT liaoxingyu sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT wangjianxin sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads AT lihongdong sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads |