Cargando…

An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile

RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing covera...

Descripción completa

Detalles Bibliográficos
Autores principales:	Prakash, Celine, Haeseler, Arndt Von
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Mary Ann Liebert, Inc. 2017
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5346924/ https://www.ncbi.nlm.nih.gov/pubmed/27661099 http://dx.doi.org/10.1089/cmb.2016.0096

_version_	1782513975876911104
author	Prakash, Celine Haeseler, Arndt Von
author_facet	Prakash, Celine Haeseler, Arndt Von
author_sort	Prakash, Celine
collection	PubMed
description	RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
format	Online Article Text
id	pubmed-5346924
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Mary Ann Liebert, Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-53469242017-03-13 An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile Prakash, Celine Haeseler, Arndt Von J Comput Biol Research Articles RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment. Mary Ann Liebert, Inc. 2017-03-01 2017-03-01 /pmc/articles/PMC5346924/ /pubmed/27661099 http://dx.doi.org/10.1089/cmb.2016.0096 Text en © Celine Prakash and Arndt Von Haeseler, 2016. Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Research Articles Prakash, Celine Haeseler, Arndt Von An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
title	An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
title_full	An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
title_fullStr	An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
title_full_unstemmed	An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
title_short	An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
title_sort	enumerative combinatorics model for fragmentation patterns in rna sequencing provides insights into nonuniformity of the expected fragment starting-point and coverage profile
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5346924/ https://www.ncbi.nlm.nih.gov/pubmed/27661099 http://dx.doi.org/10.1089/cmb.2016.0096
work_keys_str_mv	AT prakashceline anenumerativecombinatoricsmodelforfragmentationpatternsinrnasequencingprovidesinsightsintononuniformityoftheexpectedfragmentstartingpointandcoverageprofile AT haeselerarndtvon anenumerativecombinatoricsmodelforfragmentationpatternsinrnasequencingprovidesinsightsintononuniformityoftheexpectedfragmentstartingpointandcoverageprofile AT prakashceline enumerativecombinatoricsmodelforfragmentationpatternsinrnasequencingprovidesinsightsintononuniformityoftheexpectedfragmentstartingpointandcoverageprofile AT haeselerarndtvon enumerativecombinatoricsmodelforfragmentationpatternsinrnasequencingprovidesinsightsintononuniformityoftheexpectedfragmentstartingpointandcoverageprofile

An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile

Ejemplares similares