Cargando…

A comparative analysis of current phasing and imputation software

Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Alth...

Descripción completa

Detalles Bibliográficos
Autores principales: De Marino, Adriano, Mahmoud, Abdallah Amr, Bose, Madhuchanda, Bircan, Karatuğ Ozan, Terpolovsky, Andrew, Bamunusinghe, Varuna, Bohn, Sandra, Khan, Umar, Novković, Biljana, Yazdi, Puya G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581364/
https://www.ncbi.nlm.nih.gov/pubmed/36260643
http://dx.doi.org/10.1371/journal.pone.0260177
_version_ 1784812607383797760
author De Marino, Adriano
Mahmoud, Abdallah Amr
Bose, Madhuchanda
Bircan, Karatuğ Ozan
Terpolovsky, Andrew
Bamunusinghe, Varuna
Bohn, Sandra
Khan, Umar
Novković, Biljana
Yazdi, Puya G.
author_facet De Marino, Adriano
Mahmoud, Abdallah Amr
Bose, Madhuchanda
Bircan, Karatuğ Ozan
Terpolovsky, Andrew
Bamunusinghe, Varuna
Bohn, Sandra
Khan, Umar
Novković, Biljana
Yazdi, Puya G.
author_sort De Marino, Adriano
collection PubMed
description Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R(2) metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.
format Online
Article
Text
id pubmed-9581364
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-95813642022-10-20 A comparative analysis of current phasing and imputation software De Marino, Adriano Mahmoud, Abdallah Amr Bose, Madhuchanda Bircan, Karatuğ Ozan Terpolovsky, Andrew Bamunusinghe, Varuna Bohn, Sandra Khan, Umar Novković, Biljana Yazdi, Puya G. PLoS One Research Article Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R(2) metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results. Public Library of Science 2022-10-19 /pmc/articles/PMC9581364/ /pubmed/36260643 http://dx.doi.org/10.1371/journal.pone.0260177 Text en © 2022 De Marino et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
De Marino, Adriano
Mahmoud, Abdallah Amr
Bose, Madhuchanda
Bircan, Karatuğ Ozan
Terpolovsky, Andrew
Bamunusinghe, Varuna
Bohn, Sandra
Khan, Umar
Novković, Biljana
Yazdi, Puya G.
A comparative analysis of current phasing and imputation software
title A comparative analysis of current phasing and imputation software
title_full A comparative analysis of current phasing and imputation software
title_fullStr A comparative analysis of current phasing and imputation software
title_full_unstemmed A comparative analysis of current phasing and imputation software
title_short A comparative analysis of current phasing and imputation software
title_sort comparative analysis of current phasing and imputation software
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581364/
https://www.ncbi.nlm.nih.gov/pubmed/36260643
http://dx.doi.org/10.1371/journal.pone.0260177
work_keys_str_mv AT demarinoadriano acomparativeanalysisofcurrentphasingandimputationsoftware
AT mahmoudabdallahamr acomparativeanalysisofcurrentphasingandimputationsoftware
AT bosemadhuchanda acomparativeanalysisofcurrentphasingandimputationsoftware
AT bircankaratugozan acomparativeanalysisofcurrentphasingandimputationsoftware
AT terpolovskyandrew acomparativeanalysisofcurrentphasingandimputationsoftware
AT bamunusinghevaruna acomparativeanalysisofcurrentphasingandimputationsoftware
AT bohnsandra acomparativeanalysisofcurrentphasingandimputationsoftware
AT khanumar acomparativeanalysisofcurrentphasingandimputationsoftware
AT novkovicbiljana acomparativeanalysisofcurrentphasingandimputationsoftware
AT yazdipuyag acomparativeanalysisofcurrentphasingandimputationsoftware
AT demarinoadriano comparativeanalysisofcurrentphasingandimputationsoftware
AT mahmoudabdallahamr comparativeanalysisofcurrentphasingandimputationsoftware
AT bosemadhuchanda comparativeanalysisofcurrentphasingandimputationsoftware
AT bircankaratugozan comparativeanalysisofcurrentphasingandimputationsoftware
AT terpolovskyandrew comparativeanalysisofcurrentphasingandimputationsoftware
AT bamunusinghevaruna comparativeanalysisofcurrentphasingandimputationsoftware
AT bohnsandra comparativeanalysisofcurrentphasingandimputationsoftware
AT khanumar comparativeanalysisofcurrentphasingandimputationsoftware
AT novkovicbiljana comparativeanalysisofcurrentphasingandimputationsoftware
AT yazdipuyag comparativeanalysisofcurrentphasingandimputationsoftware