Cargando…

Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information

BACKGROUND: The rapid advances in genome sequencing technologies have resulted in an unprecedented number of genome variations being discovered in humans. However, there has been very limited coverage of interpretation of the personal genome sequencing data in terms of diseases. METHODS: In this pap...

Descripción completa

Detalles Bibliográficos
Autores principales:	Na, Young-Ji, Sohn, Kyung-Ah, Kim, Ju Han
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460593/ https://www.ncbi.nlm.nih.gov/pubmed/26045178 http://dx.doi.org/10.1186/1755-8794-8-S2-S4

_version_	1782375396274077696
author	Na, Young-Ji Sohn, Kyung-Ah Kim, Ju Han
author_facet	Na, Young-Ji Sohn, Kyung-Ah Kim, Ju Han
author_sort	Na, Young-Ji
collection	PubMed
description	BACKGROUND: The rapid advances in genome sequencing technologies have resulted in an unprecedented number of genome variations being discovered in humans. However, there has been very limited coverage of interpretation of the personal genome sequencing data in terms of diseases. METHODS: In this paper we present the first computational analysis scheme for interpreting personal genome data by simultaneously considering the functional impact of damaging variants and curated disease-gene association data. This method is based on mutual information as a measure of the relative closeness between the personal genome and diseases. We hypothesize that a higher mutual information score implies that the personal genome is more susceptible to a particular disease than other diseases. RESULTS: The method was applied to the sequencing data of 50 acute myeloid leukemia (AML) patients in The Cancer Genome Atlas. The utility of associations between a disease and the personal genome was explored using data of healthy (control) people obtained from the 1000 Genomes Project. The ranks of the disease terms in the AML patient group were compared with those in the healthy control group using "Leukemia, Myeloid, Acute" (C04.557.337.539.550) as the corresponding MeSH disease term. The mutual information rank of the disease term was substantially higher in the AML patient group than in the healthy control group, which demonstrates that the proposed methodology can be successfully applied to infer associations between the personal genome and diseases. CONCLUSIONS: Overall, the area under the receiver operating characteristics curve was significantly larger for the AML patient data than for the healthy controls. This methodology could contribute to consequential discoveries and explanations for mining personal genome sequencing data in terms of diseases, and have versatility with respect to genomic-based knowledge such as drug-gene and environmental-factor-gene interactions.
format	Online Article Text
id	pubmed-4460593
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44605932015-06-29 Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information Na, Young-Ji Sohn, Kyung-Ah Kim, Ju Han BMC Med Genomics Research BACKGROUND: The rapid advances in genome sequencing technologies have resulted in an unprecedented number of genome variations being discovered in humans. However, there has been very limited coverage of interpretation of the personal genome sequencing data in terms of diseases. METHODS: In this paper we present the first computational analysis scheme for interpreting personal genome data by simultaneously considering the functional impact of damaging variants and curated disease-gene association data. This method is based on mutual information as a measure of the relative closeness between the personal genome and diseases. We hypothesize that a higher mutual information score implies that the personal genome is more susceptible to a particular disease than other diseases. RESULTS: The method was applied to the sequencing data of 50 acute myeloid leukemia (AML) patients in The Cancer Genome Atlas. The utility of associations between a disease and the personal genome was explored using data of healthy (control) people obtained from the 1000 Genomes Project. The ranks of the disease terms in the AML patient group were compared with those in the healthy control group using "Leukemia, Myeloid, Acute" (C04.557.337.539.550) as the corresponding MeSH disease term. The mutual information rank of the disease term was substantially higher in the AML patient group than in the healthy control group, which demonstrates that the proposed methodology can be successfully applied to infer associations between the personal genome and diseases. CONCLUSIONS: Overall, the area under the receiver operating characteristics curve was significantly larger for the AML patient data than for the healthy controls. This methodology could contribute to consequential discoveries and explanations for mining personal genome sequencing data in terms of diseases, and have versatility with respect to genomic-based knowledge such as drug-gene and environmental-factor-gene interactions. BioMed Central 2015-05-29 /pmc/articles/PMC4460593/ /pubmed/26045178 http://dx.doi.org/10.1186/1755-8794-8-S2-S4 Text en Copyright © 2015 Na et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Na, Young-Ji Sohn, Kyung-Ah Kim, Ju Han Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
title	Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
title_full	Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
title_fullStr	Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
title_full_unstemmed	Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
title_short	Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
title_sort	interpretation of personal genome sequencing data in terms of disease ranks based on mutual information
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460593/ https://www.ncbi.nlm.nih.gov/pubmed/26045178 http://dx.doi.org/10.1186/1755-8794-8-S2-S4
work_keys_str_mv	AT nayoungji interpretationofpersonalgenomesequencingdataintermsofdiseaseranksbasedonmutualinformation AT sohnkyungah interpretationofpersonalgenomesequencingdataintermsofdiseaseranksbasedonmutualinformation AT kimjuhan interpretationofpersonalgenomesequencingdataintermsofdiseaseranksbasedonmutualinformation

Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information

Ejemplares similares