Cargando…

A Bayesian nonparametric method for prediction in EST analysis

BACKGROUND: Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lijoi, Antonio, Mena, Ramsés H, Prünster, Igor
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2220008/
https://www.ncbi.nlm.nih.gov/pubmed/17868445
http://dx.doi.org/10.1186/1471-2105-8-339
_version_ 1782149326693203968
author Lijoi, Antonio
Mena, Ramsés H
Prünster, Igor
author_facet Lijoi, Antonio
Mena, Ramsés H
Prünster, Igor
author_sort Lijoi, Antonio
collection PubMed
description BACKGROUND: Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. RESULTS: In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. CONCLUSION: The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
format Text
id pubmed-2220008
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22200082008-01-31 A Bayesian nonparametric method for prediction in EST analysis Lijoi, Antonio Mena, Ramsés H Prünster, Igor BMC Bioinformatics Methodology Article BACKGROUND: Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. RESULTS: In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. CONCLUSION: The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. BioMed Central 2007-09-14 /pmc/articles/PMC2220008/ /pubmed/17868445 http://dx.doi.org/10.1186/1471-2105-8-339 Text en Copyright © 2007 Lijoi et al; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lijoi, Antonio
Mena, Ramsés H
Prünster, Igor
A Bayesian nonparametric method for prediction in EST analysis
title A Bayesian nonparametric method for prediction in EST analysis
title_full A Bayesian nonparametric method for prediction in EST analysis
title_fullStr A Bayesian nonparametric method for prediction in EST analysis
title_full_unstemmed A Bayesian nonparametric method for prediction in EST analysis
title_short A Bayesian nonparametric method for prediction in EST analysis
title_sort bayesian nonparametric method for prediction in est analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2220008/
https://www.ncbi.nlm.nih.gov/pubmed/17868445
http://dx.doi.org/10.1186/1471-2105-8-339
work_keys_str_mv AT lijoiantonio abayesiannonparametricmethodforpredictioninestanalysis
AT menaramsesh abayesiannonparametricmethodforpredictioninestanalysis
AT prunsterigor abayesiannonparametricmethodforpredictioninestanalysis
AT lijoiantonio bayesiannonparametricmethodforpredictioninestanalysis
AT menaramsesh bayesiannonparametricmethodforpredictioninestanalysis
AT prunsterigor bayesiannonparametricmethodforpredictioninestanalysis