Cargando…
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biological...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8120939/ https://www.ncbi.nlm.nih.gov/pubmed/34025950 http://dx.doi.org/10.1016/j.csbj.2021.04.064 |
_version_ | 1783692216304140288 |
---|---|
author | Wu, Honglong Wang, Xuebin Chu, Mengtian Li, Dongfang Cheng, Lixin Zhou, Ke |
author_facet | Wu, Honglong Wang, Xuebin Chu, Mengtian Li, Dongfang Cheng, Lixin Zhou, Ke |
author_sort | Wu, Honglong |
collection | PubMed |
description | The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB. |
format | Online Article Text |
id | pubmed-8120939 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-81209392021-05-21 HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data Wu, Honglong Wang, Xuebin Chu, Mengtian Li, Dongfang Cheng, Lixin Zhou, Ke Comput Struct Biotechnol J Research Article The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB. Research Network of Computational and Structural Biotechnology 2021-04-27 /pmc/articles/PMC8120939/ /pubmed/34025950 http://dx.doi.org/10.1016/j.csbj.2021.04.064 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Wu, Honglong Wang, Xuebin Chu, Mengtian Li, Dongfang Cheng, Lixin Zhou, Ke HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title | HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_full | HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_fullStr | HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_full_unstemmed | HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_short | HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_sort | hcmb: a stable and efficient algorithm for processing the normalization of highly sparse hi-c contact data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8120939/ https://www.ncbi.nlm.nih.gov/pubmed/34025950 http://dx.doi.org/10.1016/j.csbj.2021.04.064 |
work_keys_str_mv | AT wuhonglong hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT wangxuebin hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT chumengtian hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT lidongfang hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT chenglixin hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT zhouke hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata |