Cargando…

Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction

BACKGROUND: Many medicinal plants are known for their complex genomes with high ploidy, heterozygosity, and repetitive content which pose severe challenges for genome sequencing of those species. Long reads from Oxford nanopore sequencing technology (ONT) or Pacific Biosciences Single Molecule, Real...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Peng, Tian, Zunzhe, Han, Yuwei, Zhang, Weixiong, Zhou, Tinggan, Peng, Yingmei, Hu, Hao, Cai, Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364492/
https://www.ncbi.nlm.nih.gov/pubmed/35945546
http://dx.doi.org/10.1186/s13020-022-00644-1
_version_ 1784765155428532224
author Zeng, Peng
Tian, Zunzhe
Han, Yuwei
Zhang, Weixiong
Zhou, Tinggan
Peng, Yingmei
Hu, Hao
Cai, Jing
author_facet Zeng, Peng
Tian, Zunzhe
Han, Yuwei
Zhang, Weixiong
Zhou, Tinggan
Peng, Yingmei
Hu, Hao
Cai, Jing
author_sort Zeng, Peng
collection PubMed
description BACKGROUND: Many medicinal plants are known for their complex genomes with high ploidy, heterozygosity, and repetitive content which pose severe challenges for genome sequencing of those species. Long reads from Oxford nanopore sequencing technology (ONT) or Pacific Biosciences Single Molecule, Real-Time (SMRT) sequencing offer great advantages in de novo genome assembly, especially for complex genomes with high heterozygosity and repetitive content. Currently, multiple allotetraploid species have sequenced their genomes by long-read sequencing. However, we found that a considerable proportion of these genomes (7.9% on average, maximum 23.7%) could not be covered by NGS (Next Generation Sequencing) reads (uncovered region by NGS reads, UCR) suggesting the questionable and low-quality of those area or genomic areas that can’t be sequenced by NGS due to sequencing bias. The underlying causes of those UCR in the genome assembly and solutions to this problem have never been studied. METHODS: In the study, we sequenced the tetraploid genome of Veratrum dahuricum (Turcz.) O. Loes (VDL), a Chinese medicinal plant, with ONT platform and assembled the genome with three strategies in parallel. We compared the qualities, coverage, and heterozygosity of the three ONT assemblies with another released assembly of the same individual using reads from PacBio circular consensus sequencing (CCS) technology, to explore the cause of the UCR. RESULTS: By mapping the NGS reads against the three ONT assemblies and the CCS assembly, we found that the coverage of those ONT assemblies by NGS reads ranged from 49.15 to 76.31%, much smaller than that of the CCS assembly (99.53%). And alignment between ONT assemblies and CCS assembly showed that most UCR can be aligned with CCS assembly. So, we conclude that the UCRs in ONT assembly are low-quality sequences with a high error rate that can’t be aligned with short reads, rather than genomic regions that can’t be sequenced by NGS. Further comparison among the intermediate versions of ONT assemblies showed that the most probable origin of those errors is a combination of artificial errors introduced by “self-correction” and initial sequencing error in long reads. We also found that polishing the ONT assembly with CCS reads can correct those errors efficiently. CONCLUSIONS: Through analyzing genome features and reads alignment, we have found the causes for the high proportion of UCR in ONT assembly of VDL are sequencing errors and additional errors introduced by self-correction. The high error rates of ONT-raw reads make them not suitable for self-correction prior to allotetraploid genome assembly, as the self-correction will introduce artificial errors to > 5% of the UCR sequences. We suggest high-precision CCS reads be used to polish the assembly to correct those errors effectively for polyploid genomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13020-022-00644-1.
format Online
Article
Text
id pubmed-9364492
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93644922022-08-11 Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction Zeng, Peng Tian, Zunzhe Han, Yuwei Zhang, Weixiong Zhou, Tinggan Peng, Yingmei Hu, Hao Cai, Jing Chin Med Research BACKGROUND: Many medicinal plants are known for their complex genomes with high ploidy, heterozygosity, and repetitive content which pose severe challenges for genome sequencing of those species. Long reads from Oxford nanopore sequencing technology (ONT) or Pacific Biosciences Single Molecule, Real-Time (SMRT) sequencing offer great advantages in de novo genome assembly, especially for complex genomes with high heterozygosity and repetitive content. Currently, multiple allotetraploid species have sequenced their genomes by long-read sequencing. However, we found that a considerable proportion of these genomes (7.9% on average, maximum 23.7%) could not be covered by NGS (Next Generation Sequencing) reads (uncovered region by NGS reads, UCR) suggesting the questionable and low-quality of those area or genomic areas that can’t be sequenced by NGS due to sequencing bias. The underlying causes of those UCR in the genome assembly and solutions to this problem have never been studied. METHODS: In the study, we sequenced the tetraploid genome of Veratrum dahuricum (Turcz.) O. Loes (VDL), a Chinese medicinal plant, with ONT platform and assembled the genome with three strategies in parallel. We compared the qualities, coverage, and heterozygosity of the three ONT assemblies with another released assembly of the same individual using reads from PacBio circular consensus sequencing (CCS) technology, to explore the cause of the UCR. RESULTS: By mapping the NGS reads against the three ONT assemblies and the CCS assembly, we found that the coverage of those ONT assemblies by NGS reads ranged from 49.15 to 76.31%, much smaller than that of the CCS assembly (99.53%). And alignment between ONT assemblies and CCS assembly showed that most UCR can be aligned with CCS assembly. So, we conclude that the UCRs in ONT assembly are low-quality sequences with a high error rate that can’t be aligned with short reads, rather than genomic regions that can’t be sequenced by NGS. Further comparison among the intermediate versions of ONT assemblies showed that the most probable origin of those errors is a combination of artificial errors introduced by “self-correction” and initial sequencing error in long reads. We also found that polishing the ONT assembly with CCS reads can correct those errors efficiently. CONCLUSIONS: Through analyzing genome features and reads alignment, we have found the causes for the high proportion of UCR in ONT assembly of VDL are sequencing errors and additional errors introduced by self-correction. The high error rates of ONT-raw reads make them not suitable for self-correction prior to allotetraploid genome assembly, as the self-correction will introduce artificial errors to > 5% of the UCR sequences. We suggest high-precision CCS reads be used to polish the assembly to correct those errors effectively for polyploid genomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13020-022-00644-1. BioMed Central 2022-08-09 /pmc/articles/PMC9364492/ /pubmed/35945546 http://dx.doi.org/10.1186/s13020-022-00644-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zeng, Peng
Tian, Zunzhe
Han, Yuwei
Zhang, Weixiong
Zhou, Tinggan
Peng, Yingmei
Hu, Hao
Cai, Jing
Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
title Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
title_full Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
title_fullStr Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
title_full_unstemmed Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
title_short Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
title_sort comparison of ont and ccs sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ont reads are not suitable for self-correction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364492/
https://www.ncbi.nlm.nih.gov/pubmed/35945546
http://dx.doi.org/10.1186/s13020-022-00644-1
work_keys_str_mv AT zengpeng comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT tianzunzhe comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT hanyuwei comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT zhangweixiong comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT zhoutinggan comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT pengyingmei comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT huhao comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection
AT caijing comparisonofontandccssequencingtechnologiesonthepolyploidgenomeofamedicinalplantshowedthathigherrorrateofontreadsarenotsuitableforselfcorrection