Cargando…

HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data

The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biological...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Honglong, Wang, Xuebin, Chu, Mengtian, Li, Dongfang, Cheng, Lixin, Zhou, Ke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8120939/
https://www.ncbi.nlm.nih.gov/pubmed/34025950
http://dx.doi.org/10.1016/j.csbj.2021.04.064
_version_ 1783692216304140288
author Wu, Honglong
Wang, Xuebin
Chu, Mengtian
Li, Dongfang
Cheng, Lixin
Zhou, Ke
author_facet Wu, Honglong
Wang, Xuebin
Chu, Mengtian
Li, Dongfang
Cheng, Lixin
Zhou, Ke
author_sort Wu, Honglong
collection PubMed
description The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.
format Online
Article
Text
id pubmed-8120939
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-81209392021-05-21 HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data Wu, Honglong Wang, Xuebin Chu, Mengtian Li, Dongfang Cheng, Lixin Zhou, Ke Comput Struct Biotechnol J Research Article The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB. Research Network of Computational and Structural Biotechnology 2021-04-27 /pmc/articles/PMC8120939/ /pubmed/34025950 http://dx.doi.org/10.1016/j.csbj.2021.04.064 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Wu, Honglong
Wang, Xuebin
Chu, Mengtian
Li, Dongfang
Cheng, Lixin
Zhou, Ke
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_full HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_fullStr HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_full_unstemmed HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_short HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_sort hcmb: a stable and efficient algorithm for processing the normalization of highly sparse hi-c contact data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8120939/
https://www.ncbi.nlm.nih.gov/pubmed/34025950
http://dx.doi.org/10.1016/j.csbj.2021.04.064
work_keys_str_mv AT wuhonglong hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT wangxuebin hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT chumengtian hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT lidongfang hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT chenglixin hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT zhouke hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata