On reliable discovery of molecular signatures

BACKGROUND: Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not...

Descripción completa

Detalles Bibliográficos
Autores principales: Nilsson, Roland, Björkegren, Johan, Tegnér, Jesper
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646701/
https://www.ncbi.nlm.nih.gov/pubmed/19178740
http://dx.doi.org/10.1186/1471-2105-10-38
_version_ 1782164879451357184
author Nilsson, Roland
Björkegren, Johan
Tegnér, Jesper
author_facet Nilsson, Roland
Björkegren, Johan
Tegnér, Jesper
author_sort Nilsson, Roland
collection PubMed
description BACKGROUND: Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability. RESULTS: We demonstrate that with modern prediction methods, signatures that yield accurate predictions may still have a high FDR. Further, we show that even signatures with low FDR may fail to replicate in independent studies due to limited statistical power. Thus, neither stability nor predictive accuracy are relevant when FDR control is the primary goal. We therefore develop a general statistical hypothesis testing framework that for the first time provides FDR control for signature discovery. Our method is demonstrated to be correct in simulation studies. When applied to five cancer data sets, the method was able to discover molecular signatures with 5% FDR in three cases, while two data sets yielded no significant findings. CONCLUSION: Our approach enables reliable discovery of molecular signatures from genome-wide data with current sample sizes. The statistical framework developed herein is potentially applicable to a wide range of prediction problems in bioinformatics.
format Text
id pubmed-2646701
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26467012009-02-24 On reliable discovery of molecular signatures Nilsson, Roland Björkegren, Johan Tegnér, Jesper BMC Bioinformatics Research Article BACKGROUND: Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability. RESULTS: We demonstrate that with modern prediction methods, signatures that yield accurate predictions may still have a high FDR. Further, we show that even signatures with low FDR may fail to replicate in independent studies due to limited statistical power. Thus, neither stability nor predictive accuracy are relevant when FDR control is the primary goal. We therefore develop a general statistical hypothesis testing framework that for the first time provides FDR control for signature discovery. Our method is demonstrated to be correct in simulation studies. When applied to five cancer data sets, the method was able to discover molecular signatures with 5% FDR in three cases, while two data sets yielded no significant findings. CONCLUSION: Our approach enables reliable discovery of molecular signatures from genome-wide data with current sample sizes. The statistical framework developed herein is potentially applicable to a wide range of prediction problems in bioinformatics. BioMed Central 2009-01-29 /pmc/articles/PMC2646701/ /pubmed/19178740 http://dx.doi.org/10.1186/1471-2105-10-38 Text en Copyright © 2009 Nilsson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nilsson, Roland
Björkegren, Johan
Tegnér, Jesper
On reliable discovery of molecular signatures
title On reliable discovery of molecular signatures
title_full On reliable discovery of molecular signatures
title_fullStr On reliable discovery of molecular signatures
title_full_unstemmed On reliable discovery of molecular signatures
title_short On reliable discovery of molecular signatures
title_sort on reliable discovery of molecular signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646701/
https://www.ncbi.nlm.nih.gov/pubmed/19178740
http://dx.doi.org/10.1186/1471-2105-10-38
work_keys_str_mv AT nilssonroland onreliablediscoveryofmolecularsignatures
AT bjorkegrenjohan onreliablediscoveryofmolecularsignatures
AT tegnerjesper onreliablediscoveryofmolecularsignatures