Cargando…

Pattern Recognition on Read Positioning in Next Generation Sequencing

The usefulness and the utility of the next generation sequencing (NGS) technology are based on the assumption that the DNA or cDNA cleavage required to generate short sequence reads is random. Several previous reports suggest the existence of sequencing bias of NGS reads. To address this question in...

Descripción completa

Detalles Bibliográficos
Autores principales: Byeon, Boseon, Kovalchuk, Igor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4907491/
https://www.ncbi.nlm.nih.gov/pubmed/27299343
http://dx.doi.org/10.1371/journal.pone.0157033
_version_ 1782437550896447488
author Byeon, Boseon
Kovalchuk, Igor
author_facet Byeon, Boseon
Kovalchuk, Igor
author_sort Byeon, Boseon
collection PubMed
description The usefulness and the utility of the next generation sequencing (NGS) technology are based on the assumption that the DNA or cDNA cleavage required to generate short sequence reads is random. Several previous reports suggest the existence of sequencing bias of NGS reads. To address this question in greater detail, we analyze NGS data from four organisms with different GC content, Plasmodium falciparum (19.39%), Arabidopsis thaliana (36.03%), Homo sapiens (40.91%) and Streptomyces coelicolor (72.00%). Using machine learning techniques, we recognize the pattern that the NGS read start is positioned in the local region where the nucleotide distribution is dissimilar from the global nucleotide distribution. We also demonstrate that the mono-nucleotide distribution underestimates sequencing bias, and the recognized pattern is explained largely by the distribution of multi-nucleotides (di-, tri-, and tetra- nucleotides) rather than mono-nucleotides. This implies that the correction of sequencing bias needs to be performed on the basis of the multi-nucleotide distribution. Providing companion software to quantify the effect of the recognized pattern on read positioning, we exemplify that the bias correction based on the mono-nucleotide distribution may not be sufficient to clean sequencing bias.
format Online
Article
Text
id pubmed-4907491
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49074912016-07-18 Pattern Recognition on Read Positioning in Next Generation Sequencing Byeon, Boseon Kovalchuk, Igor PLoS One Research Article The usefulness and the utility of the next generation sequencing (NGS) technology are based on the assumption that the DNA or cDNA cleavage required to generate short sequence reads is random. Several previous reports suggest the existence of sequencing bias of NGS reads. To address this question in greater detail, we analyze NGS data from four organisms with different GC content, Plasmodium falciparum (19.39%), Arabidopsis thaliana (36.03%), Homo sapiens (40.91%) and Streptomyces coelicolor (72.00%). Using machine learning techniques, we recognize the pattern that the NGS read start is positioned in the local region where the nucleotide distribution is dissimilar from the global nucleotide distribution. We also demonstrate that the mono-nucleotide distribution underestimates sequencing bias, and the recognized pattern is explained largely by the distribution of multi-nucleotides (di-, tri-, and tetra- nucleotides) rather than mono-nucleotides. This implies that the correction of sequencing bias needs to be performed on the basis of the multi-nucleotide distribution. Providing companion software to quantify the effect of the recognized pattern on read positioning, we exemplify that the bias correction based on the mono-nucleotide distribution may not be sufficient to clean sequencing bias. Public Library of Science 2016-06-14 /pmc/articles/PMC4907491/ /pubmed/27299343 http://dx.doi.org/10.1371/journal.pone.0157033 Text en © 2016 Byeon, Kovalchuk http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Byeon, Boseon
Kovalchuk, Igor
Pattern Recognition on Read Positioning in Next Generation Sequencing
title Pattern Recognition on Read Positioning in Next Generation Sequencing
title_full Pattern Recognition on Read Positioning in Next Generation Sequencing
title_fullStr Pattern Recognition on Read Positioning in Next Generation Sequencing
title_full_unstemmed Pattern Recognition on Read Positioning in Next Generation Sequencing
title_short Pattern Recognition on Read Positioning in Next Generation Sequencing
title_sort pattern recognition on read positioning in next generation sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4907491/
https://www.ncbi.nlm.nih.gov/pubmed/27299343
http://dx.doi.org/10.1371/journal.pone.0157033
work_keys_str_mv AT byeonboseon patternrecognitiononreadpositioninginnextgenerationsequencing
AT kovalchukigor patternrecognitiononreadpositioninginnextgenerationsequencing