Cargando…

Pairwise comparative analysis of six haplotype assembly methods based on users’ experience

BACKGROUND: A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Shuying, Cheng, Flora, Han, Daphne, Wei, Sarah, Zhong, Alice, Massoudian, Sherwin, Johnson, Alison B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311811/
https://www.ncbi.nlm.nih.gov/pubmed/37386408
http://dx.doi.org/10.1186/s12863-023-01134-5
_version_ 1785066820616585216
author Sun, Shuying
Cheng, Flora
Han, Daphne
Wei, Sarah
Zhong, Alice
Massoudian, Sherwin
Johnson, Alison B.
author_facet Sun, Shuying
Cheng, Flora
Han, Daphne
Wei, Sarah
Zhong, Alice
Massoudian, Sherwin
Johnson, Alison B.
author_sort Sun, Shuying
collection PubMed
description BACKGROUND: A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many HA methods with their own strengths and weaknesses. This study focused on comparing six HA methods or algorithms: HapCUT2, MixSIH, PEATH, WhatsHap, SDhaP, and MAtCHap using two NA12878 datasets named hg19 and hg38. The 6 HA algorithms were run on chromosome 10 of these two datasets, each with 3 filtering levels based on sequencing depth (DP1, DP15, and DP30). Their outputs were then compared. RESULT: Run time (CPU time) was compared to assess the efficiency of 6 HA methods. HapCUT2 was the fastest HA for 6 datasets, with run time consistently under 2 min. In addition, WhatsHap was relatively fast, and its run time was 21 min or less for all 6 datasets. The other 4 HA algorithms’ run time varied across different datasets and coverage levels. To assess their accuracy, pairwise comparisons were conducted for each pair of the six packages by generating their disagreement rates for both haplotype blocks and Single Nucleotide Variants (SNVs). The authors also compared them using switch distance (error), i.e., the number of positions where two chromosomes of a certain phase must be switched to match with the known haplotype. HapCUT2, PEATH, MixSIH, and MAtCHap generated output files with similar numbers of blocks and SNVs, and they had relatively similar performance. WhatsHap generated a much larger number of SNVs in the hg19 DP1 output, which caused it to have high disagreement percentages with other methods. However, for the hg38 data, WhatsHap had similar performance as the other 4 algorithms, except SDhaP. The comparison analysis showed that SDhaP had a much larger disagreement rate when it was compared with the other algorithms in all 6 datasets. CONCLUSION: The comparative analysis is important because each algorithm is different. The findings of this study provide a deeper understanding of the performance of currently available HA algorithms and useful input for other users. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12863-023-01134-5.
format Online
Article
Text
id pubmed-10311811
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-103118112023-07-01 Pairwise comparative analysis of six haplotype assembly methods based on users’ experience Sun, Shuying Cheng, Flora Han, Daphne Wei, Sarah Zhong, Alice Massoudian, Sherwin Johnson, Alison B. BMC Genom Data Research BACKGROUND: A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many HA methods with their own strengths and weaknesses. This study focused on comparing six HA methods or algorithms: HapCUT2, MixSIH, PEATH, WhatsHap, SDhaP, and MAtCHap using two NA12878 datasets named hg19 and hg38. The 6 HA algorithms were run on chromosome 10 of these two datasets, each with 3 filtering levels based on sequencing depth (DP1, DP15, and DP30). Their outputs were then compared. RESULT: Run time (CPU time) was compared to assess the efficiency of 6 HA methods. HapCUT2 was the fastest HA for 6 datasets, with run time consistently under 2 min. In addition, WhatsHap was relatively fast, and its run time was 21 min or less for all 6 datasets. The other 4 HA algorithms’ run time varied across different datasets and coverage levels. To assess their accuracy, pairwise comparisons were conducted for each pair of the six packages by generating their disagreement rates for both haplotype blocks and Single Nucleotide Variants (SNVs). The authors also compared them using switch distance (error), i.e., the number of positions where two chromosomes of a certain phase must be switched to match with the known haplotype. HapCUT2, PEATH, MixSIH, and MAtCHap generated output files with similar numbers of blocks and SNVs, and they had relatively similar performance. WhatsHap generated a much larger number of SNVs in the hg19 DP1 output, which caused it to have high disagreement percentages with other methods. However, for the hg38 data, WhatsHap had similar performance as the other 4 algorithms, except SDhaP. The comparison analysis showed that SDhaP had a much larger disagreement rate when it was compared with the other algorithms in all 6 datasets. CONCLUSION: The comparative analysis is important because each algorithm is different. The findings of this study provide a deeper understanding of the performance of currently available HA algorithms and useful input for other users. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12863-023-01134-5. BioMed Central 2023-06-29 /pmc/articles/PMC10311811/ /pubmed/37386408 http://dx.doi.org/10.1186/s12863-023-01134-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Sun, Shuying
Cheng, Flora
Han, Daphne
Wei, Sarah
Zhong, Alice
Massoudian, Sherwin
Johnson, Alison B.
Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
title Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
title_full Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
title_fullStr Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
title_full_unstemmed Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
title_short Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
title_sort pairwise comparative analysis of six haplotype assembly methods based on users’ experience
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311811/
https://www.ncbi.nlm.nih.gov/pubmed/37386408
http://dx.doi.org/10.1186/s12863-023-01134-5
work_keys_str_mv AT sunshuying pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience
AT chengflora pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience
AT handaphne pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience
AT weisarah pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience
AT zhongalice pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience
AT massoudiansherwin pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience
AT johnsonalisonb pairwisecomparativeanalysisofsixhaplotypeassemblymethodsbasedonusersexperience