Cargando…

Tight basis cycle representatives for persistent homology of large biological data sets

Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research. It provides a rigorous method to compute robust topological features in discrete experimental observations that often contain various sources of uncertainties. Altho...

Descripción completa

Detalles Bibliográficos
Autores principales: Aggarwal, Manu, Periwal, Vipul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10275456/
https://www.ncbi.nlm.nih.gov/pubmed/37253074
http://dx.doi.org/10.1371/journal.pcbi.1010341
_version_ 1785059877173854208
author Aggarwal, Manu
Periwal, Vipul
author_facet Aggarwal, Manu
Periwal, Vipul
author_sort Aggarwal, Manu
collection PubMed
description Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research. It provides a rigorous method to compute robust topological features in discrete experimental observations that often contain various sources of uncertainties. Although powerful in theory, PH suffers from high computation cost that precludes its application to large data sets. Additionally, most analyses using PH are limited to computing the existence of nontrivial features. Precise localization of these features is not generally attempted because, by definition, localized representations are not unique and because of even higher computation cost. Such a precise location is a sine qua non for determining functional significance, especially in biological applications. Here, we provide a strategy and algorithms to compute tight representative boundaries around nontrivial robust features in large data sets. To showcase the efficiency of our algorithms and the precision of computed boundaries, we analyze the human genome and protein crystal structures. In the human genome, we found a surprising effect of the impairment of chromatin loop formation on loops through chromosome 13 and the sex chromosomes. We also found loops with long-range interactions between functionally related genes. In protein homologs with significantly different topology, we found voids attributable to ligand-interaction, mutation, and differences between species.
format Online
Article
Text
id pubmed-10275456
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-102754562023-06-17 Tight basis cycle representatives for persistent homology of large biological data sets Aggarwal, Manu Periwal, Vipul PLoS Comput Biol Research Article Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research. It provides a rigorous method to compute robust topological features in discrete experimental observations that often contain various sources of uncertainties. Although powerful in theory, PH suffers from high computation cost that precludes its application to large data sets. Additionally, most analyses using PH are limited to computing the existence of nontrivial features. Precise localization of these features is not generally attempted because, by definition, localized representations are not unique and because of even higher computation cost. Such a precise location is a sine qua non for determining functional significance, especially in biological applications. Here, we provide a strategy and algorithms to compute tight representative boundaries around nontrivial robust features in large data sets. To showcase the efficiency of our algorithms and the precision of computed boundaries, we analyze the human genome and protein crystal structures. In the human genome, we found a surprising effect of the impairment of chromatin loop formation on loops through chromosome 13 and the sex chromosomes. We also found loops with long-range interactions between functionally related genes. In protein homologs with significantly different topology, we found voids attributable to ligand-interaction, mutation, and differences between species. Public Library of Science 2023-05-30 /pmc/articles/PMC10275456/ /pubmed/37253074 http://dx.doi.org/10.1371/journal.pcbi.1010341 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Aggarwal, Manu
Periwal, Vipul
Tight basis cycle representatives for persistent homology of large biological data sets
title Tight basis cycle representatives for persistent homology of large biological data sets
title_full Tight basis cycle representatives for persistent homology of large biological data sets
title_fullStr Tight basis cycle representatives for persistent homology of large biological data sets
title_full_unstemmed Tight basis cycle representatives for persistent homology of large biological data sets
title_short Tight basis cycle representatives for persistent homology of large biological data sets
title_sort tight basis cycle representatives for persistent homology of large biological data sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10275456/
https://www.ncbi.nlm.nih.gov/pubmed/37253074
http://dx.doi.org/10.1371/journal.pcbi.1010341
work_keys_str_mv AT aggarwalmanu tightbasiscyclerepresentativesforpersistenthomologyoflargebiologicaldatasets
AT periwalvipul tightbasiscyclerepresentativesforpersistenthomologyoflargebiologicaldatasets