Cargando…

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads

The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and p...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wenjing, Huang, Neng, Zheng, Jiantao, Liao, Xingyu, Wang, Jianxin, Li, Hong-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356754/
https://www.ncbi.nlm.nih.gov/pubmed/30646604
http://dx.doi.org/10.3390/genes10010044
_version_ 1783391628479692800
author Zhang, Wenjing
Huang, Neng
Zheng, Jiantao
Liao, Xingyu
Wang, Jianxin
Li, Hong-Dong
author_facet Zhang, Wenjing
Huang, Neng
Zheng, Jiantao
Liao, Xingyu
Wang, Jianxin
Li, Hong-Dong
author_sort Zhang, Wenjing
collection PubMed
description The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.
format Online
Article
Text
id pubmed-6356754
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-63567542019-02-04 A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads Zhang, Wenjing Huang, Neng Zheng, Jiantao Liao, Xingyu Wang, Jianxin Li, Hong-Dong Genes (Basel) Article The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms. MDPI 2019-01-14 /pmc/articles/PMC6356754/ /pubmed/30646604 http://dx.doi.org/10.3390/genes10010044 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Wenjing
Huang, Neng
Zheng, Jiantao
Liao, Xingyu
Wang, Jianxin
Li, Hong-Dong
A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
title A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
title_full A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
title_fullStr A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
title_full_unstemmed A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
title_short A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
title_sort sequence-based novel approach for quality evaluation of third-generation sequencing reads
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356754/
https://www.ncbi.nlm.nih.gov/pubmed/30646604
http://dx.doi.org/10.3390/genes10010044
work_keys_str_mv AT zhangwenjing asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT huangneng asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT zhengjiantao asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT liaoxingyu asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT wangjianxin asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT lihongdong asequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT zhangwenjing sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT huangneng sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT zhengjiantao sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT liaoxingyu sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT wangjianxin sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads
AT lihongdong sequencebasednovelapproachforqualityevaluationofthirdgenerationsequencingreads