Cargando…

Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries

BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ji-Ping Z, Lindsay, Bruce G, Cui, Liying, Wall, P Kerr, Marion, Josh, Zhang, Jiaxuan, dePamphilis, Claude W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1369009/
https://www.ncbi.nlm.nih.gov/pubmed/16351717
http://dx.doi.org/10.1186/1471-2105-6-300
_version_ 1782126781853073408
author Wang, Ji-Ping Z
Lindsay, Bruce G
Cui, Liying
Wall, P Kerr
Marion, Josh
Zhang, Jiaxuan
dePamphilis, Claude W
author_facet Wang, Ji-Ping Z
Lindsay, Bruce G
Cui, Liying
Wall, P Kerr
Marion, Josh
Zhang, Jiaxuan
dePamphilis, Claude W
author_sort Wang, Ji-Ping Z
collection PubMed
description BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS: We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION: The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing.
format Text
id pubmed-1369009
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13690092006-03-23 Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries Wang, Ji-Ping Z Lindsay, Bruce G Cui, Liying Wall, P Kerr Marion, Josh Zhang, Jiaxuan dePamphilis, Claude W BMC Bioinformatics Methodology Article BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS: We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION: The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing. BioMed Central 2005-12-13 /pmc/articles/PMC1369009/ /pubmed/16351717 http://dx.doi.org/10.1186/1471-2105-6-300 Text en Copyright © 2005 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wang, Ji-Ping Z
Lindsay, Bruce G
Cui, Liying
Wall, P Kerr
Marion, Josh
Zhang, Jiaxuan
dePamphilis, Claude W
Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
title Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
title_full Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
title_fullStr Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
title_full_unstemmed Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
title_short Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
title_sort gene capture prediction and overlap estimation in est sequencing from one or multiple libraries
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1369009/
https://www.ncbi.nlm.nih.gov/pubmed/16351717
http://dx.doi.org/10.1186/1471-2105-6-300
work_keys_str_mv AT wangjipingz genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries
AT lindsaybruceg genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries
AT cuiliying genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries
AT wallpkerr genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries
AT marionjosh genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries
AT zhangjiaxuan genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries
AT depamphilisclaudew genecapturepredictionandoverlapestimationinestsequencingfromoneormultiplelibraries