Cargando…

Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads

Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwartz, Schraga, Oren, Ram, Ast, Gil
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3031631/
https://www.ncbi.nlm.nih.gov/pubmed/21304912
http://dx.doi.org/10.1371/journal.pone.0016685
_version_ 1782197378387804160
author Schwartz, Schraga
Oren, Ram
Ast, Gil
author_facet Schwartz, Schraga
Oren, Ram
Ast, Gil
author_sort Schwartz, Schraga
collection PubMed
description Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads.
format Text
id pubmed-3031631
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30316312011-02-08 Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads Schwartz, Schraga Oren, Ram Ast, Gil PLoS One Research Article Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads. Public Library of Science 2011-01-31 /pmc/articles/PMC3031631/ /pubmed/21304912 http://dx.doi.org/10.1371/journal.pone.0016685 Text en Schwartz et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Schwartz, Schraga
Oren, Ram
Ast, Gil
Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads
title Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads
title_full Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads
title_fullStr Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads
title_full_unstemmed Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads
title_short Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads
title_sort detection and removal of biases in the analysis of next-generation sequencing reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3031631/
https://www.ncbi.nlm.nih.gov/pubmed/21304912
http://dx.doi.org/10.1371/journal.pone.0016685
work_keys_str_mv AT schwartzschraga detectionandremovalofbiasesintheanalysisofnextgenerationsequencingreads
AT orenram detectionandremovalofbiasesintheanalysisofnextgenerationsequencingreads
AT astgil detectionandremovalofbiasesintheanalysisofnextgenerationsequencingreads