Cargando…

Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases

BACKGROUND: Targeted Next Generation Sequencing is a common and powerful approach used in both clinical and research settings. However, at present, a large fraction of the acquired genetic information is not used since pathogenicity cannot be assessed for most variants. Further complicating this sce...

Descripción completa

Detalles Bibliográficos
Autores principales: Tarozzi, M., Bartoletti-Stella, A., Dall’Olio, D., Matteuzzi, T., Baiardi, S., Parchi, P., Castellani, G., Capellari, S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830183/
https://www.ncbi.nlm.nih.gov/pubmed/35144616
http://dx.doi.org/10.1186/s12920-022-01173-4
_version_ 1784648224553828352
author Tarozzi, M.
Bartoletti-Stella, A.
Dall’Olio, D.
Matteuzzi, T.
Baiardi, S.
Parchi, P.
Castellani, G.
Capellari, S.
author_facet Tarozzi, M.
Bartoletti-Stella, A.
Dall’Olio, D.
Matteuzzi, T.
Baiardi, S.
Parchi, P.
Castellani, G.
Capellari, S.
author_sort Tarozzi, M.
collection PubMed
description BACKGROUND: Targeted Next Generation Sequencing is a common and powerful approach used in both clinical and research settings. However, at present, a large fraction of the acquired genetic information is not used since pathogenicity cannot be assessed for most variants. Further complicating this scenario is the increasingly frequent description of a poli/oligogenic pattern of inheritance showing the contribution of multiple variants in increasing disease risk. We present an approach in which the entire genetic information provided by target sequencing is transformed into binary data on which we performed statistical, machine learning, and network analyses to extract all valuable information from the entire genetic profile. To test this approach and unbiasedly explore the presence of recurrent genetic patterns, we studied a cohort of 112 patients affected either by genetic Creutzfeldt–Jakob (CJD) disease caused by two mutations in the PRNP gene (p.E200K and p.V210I) with different penetrance or by sporadic Alzheimer disease (sAD). RESULTS: Unsupervised methods can identify functionally relevant sources of variation in the data, like haplogroups and polymorphisms that do not follow Hardy–Weinberg equilibrium, such as the NOTCH3 rs11670823 (c.3837 + 21 T > A). Supervised classifiers can recognize clinical phenotypes with high accuracy based on the mutational profile of patients. In addition, we found a similar alteration of allele frequencies compared the European population in sporadic patients and in V210I-CJD, a poorly penetrant PRNP mutation, and sAD, suggesting shared oligogenic patterns in different types of dementia. Pathway enrichment and protein–protein interaction network revealed different altered pathways between the two PRNP mutations. CONCLUSIONS: We propose this workflow as a possible approach to gain deeper insights into the genetic information derived from target sequencing, to identify recurrent genetic patterns and improve the understanding of complex diseases. This work could also represent a possible starting point of a predictive tool for personalized medicine and advanced diagnostic applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-022-01173-4.
format Online
Article
Text
id pubmed-8830183
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88301832022-02-11 Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases Tarozzi, M. Bartoletti-Stella, A. Dall’Olio, D. Matteuzzi, T. Baiardi, S. Parchi, P. Castellani, G. Capellari, S. BMC Med Genomics Research BACKGROUND: Targeted Next Generation Sequencing is a common and powerful approach used in both clinical and research settings. However, at present, a large fraction of the acquired genetic information is not used since pathogenicity cannot be assessed for most variants. Further complicating this scenario is the increasingly frequent description of a poli/oligogenic pattern of inheritance showing the contribution of multiple variants in increasing disease risk. We present an approach in which the entire genetic information provided by target sequencing is transformed into binary data on which we performed statistical, machine learning, and network analyses to extract all valuable information from the entire genetic profile. To test this approach and unbiasedly explore the presence of recurrent genetic patterns, we studied a cohort of 112 patients affected either by genetic Creutzfeldt–Jakob (CJD) disease caused by two mutations in the PRNP gene (p.E200K and p.V210I) with different penetrance or by sporadic Alzheimer disease (sAD). RESULTS: Unsupervised methods can identify functionally relevant sources of variation in the data, like haplogroups and polymorphisms that do not follow Hardy–Weinberg equilibrium, such as the NOTCH3 rs11670823 (c.3837 + 21 T > A). Supervised classifiers can recognize clinical phenotypes with high accuracy based on the mutational profile of patients. In addition, we found a similar alteration of allele frequencies compared the European population in sporadic patients and in V210I-CJD, a poorly penetrant PRNP mutation, and sAD, suggesting shared oligogenic patterns in different types of dementia. Pathway enrichment and protein–protein interaction network revealed different altered pathways between the two PRNP mutations. CONCLUSIONS: We propose this workflow as a possible approach to gain deeper insights into the genetic information derived from target sequencing, to identify recurrent genetic patterns and improve the understanding of complex diseases. This work could also represent a possible starting point of a predictive tool for personalized medicine and advanced diagnostic applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-022-01173-4. BioMed Central 2022-02-10 /pmc/articles/PMC8830183/ /pubmed/35144616 http://dx.doi.org/10.1186/s12920-022-01173-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Tarozzi, M.
Bartoletti-Stella, A.
Dall’Olio, D.
Matteuzzi, T.
Baiardi, S.
Parchi, P.
Castellani, G.
Capellari, S.
Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
title Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
title_full Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
title_fullStr Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
title_full_unstemmed Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
title_short Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
title_sort identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830183/
https://www.ncbi.nlm.nih.gov/pubmed/35144616
http://dx.doi.org/10.1186/s12920-022-01173-4
work_keys_str_mv AT tarozzim identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT bartolettistellaa identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT dalloliod identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT matteuzzit identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT baiardis identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT parchip identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT castellanig identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases
AT capellaris identificationofrecurrentgeneticpatternsfromtargetedsequencingpanelswithadvanceddatascienceacasestudyonsporadicandgeneticneurodegenerativediseases