Cargando…

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields

Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevan...

Descripción completa

Detalles Bibliográficos
Autores principales: Barutcuoglu, Zafer, Airoldi, Edoardo M., Dumeaux, Vanessa, Schapire, Robert E., Troyanskaya, Olga G.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677736/
https://www.ncbi.nlm.nih.gov/pubmed/19052061
http://dx.doi.org/10.1093/bioinformatics/btn585
_version_ 1782166795074928640
author Barutcuoglu, Zafer
Airoldi, Edoardo M.
Dumeaux, Vanessa
Schapire, Robert E.
Troyanskaya, Olga G.
author_facet Barutcuoglu, Zafer
Airoldi, Edoardo M.
Dumeaux, Vanessa
Schapire, Robert E.
Troyanskaya, Olga G.
author_sort Barutcuoglu, Zafer
collection PubMed
description Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome. Results: Here, we present the heterogeneous hidden conditional random field, a new integrated array-CGH analysis method for jointly classifying tumors, inferring copy numbers and identifying clinically relevant positions in recurrent alteration regions. By capturing the sequentiality as well as the locality of changes, our integrated model provides better noise reduction, and achieves more relevant gene retrieval and more accurate classification than existing methods. We provide an efficient L(1)-regularized discriminative training algorithm, which notably selects a small set of candidate genes most likely to be clinically relevant and driving the recurrent amplicons of importance. Our method thus provides unbiased starting points in deciding which genomic regions and which genes in particular to pursue for further examination. Our experiments on synthetic data and real genomic cancer prediction data show that our method is superior, both in prediction accuracy and relevant feature discovery, to existing methods. We also demonstrate that it can be used to generate novel biological hypotheses for breast cancer. Contact: ogt@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2677736
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-26777362009-05-08 Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields Barutcuoglu, Zafer Airoldi, Edoardo M. Dumeaux, Vanessa Schapire, Robert E. Troyanskaya, Olga G. Bioinformatics Original Papers Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome. Results: Here, we present the heterogeneous hidden conditional random field, a new integrated array-CGH analysis method for jointly classifying tumors, inferring copy numbers and identifying clinically relevant positions in recurrent alteration regions. By capturing the sequentiality as well as the locality of changes, our integrated model provides better noise reduction, and achieves more relevant gene retrieval and more accurate classification than existing methods. We provide an efficient L(1)-regularized discriminative training algorithm, which notably selects a small set of candidate genes most likely to be clinically relevant and driving the recurrent amplicons of importance. Our method thus provides unbiased starting points in deciding which genomic regions and which genes in particular to pursue for further examination. Our experiments on synthetic data and real genomic cancer prediction data show that our method is superior, both in prediction accuracy and relevant feature discovery, to existing methods. We also demonstrate that it can be used to generate novel biological hypotheses for breast cancer. Contact: ogt@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online. Oxford University Press 2009-05-15 2008-12-03 /pmc/articles/PMC2677736/ /pubmed/19052061 http://dx.doi.org/10.1093/bioinformatics/btn585 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Barutcuoglu, Zafer
Airoldi, Edoardo M.
Dumeaux, Vanessa
Schapire, Robert E.
Troyanskaya, Olga G.
Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
title Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
title_full Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
title_fullStr Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
title_full_unstemmed Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
title_short Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
title_sort aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677736/
https://www.ncbi.nlm.nih.gov/pubmed/19052061
http://dx.doi.org/10.1093/bioinformatics/btn585
work_keys_str_mv AT barutcuogluzafer aneuploidypredictionandtumorclassificationwithheterogeneoushiddenconditionalrandomfields
AT airoldiedoardom aneuploidypredictionandtumorclassificationwithheterogeneoushiddenconditionalrandomfields
AT dumeauxvanessa aneuploidypredictionandtumorclassificationwithheterogeneoushiddenconditionalrandomfields
AT schapireroberte aneuploidypredictionandtumorclassificationwithheterogeneoushiddenconditionalrandomfields
AT troyanskayaolgag aneuploidypredictionandtumorclassificationwithheterogeneoushiddenconditionalrandomfields