Cargando…

Converting single nucleotide variants between genome builds: from cautionary tale to solution

Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the mos...

Descripción completa

Detalles Bibliográficos
Autores principales: Ormond, Cathal, Ryan, Niamh M, Corvin, Aiden, Heron, Elizabeth A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425424/
https://www.ncbi.nlm.nih.gov/pubmed/33822888
http://dx.doi.org/10.1093/bib/bbab069
_version_ 1783749847275274240
author Ormond, Cathal
Ryan, Niamh M
Corvin, Aiden
Heron, Elizabeth A
author_facet Ormond, Cathal
Ryan, Niamh M
Corvin, Aiden
Heron, Elizabeth A
author_sort Ormond, Cathal
collection PubMed
description Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the most accurate position information, but this procedure is often computationally expensive, and therefore, tools such as liftOver and CrossMap are used to convert data from one build to another. However, the positions of converted SNVs do not always match SNVs derived from aligned data, and in some instances, SNVs are known to change chromosome when converted. This is a significant problem when compiling sequencing resources or comparing results across studies. Here, we describe a novel algorithm to identify positions that are unstable when converting between human genome reference builds. These positions are detected independent of the conversion tools and are determined by the chain files, which provide a mapping of contiguous positions from one build to another. We also provide the list of unstable positions for converting between the two most commonly used builds GRCh37 and GRCh38. Pre-excluding SNVs at these positions, prior to conversion, results in SNVs that are stable to conversion. This simple procedure gives the same final list of stable SNVs as applying the algorithm and subsequently removing variants at unstable positions. This work highlights the care that must be taken when converting SNVs between genome builds and provides a simple method for ensuring higher confidence converted data. Unstable positions and algorithm code, available at https://github.com/cathaloruaidh/genomeBuildConversion
format Online
Article
Text
id pubmed-8425424
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84254242021-09-09 Converting single nucleotide variants between genome builds: from cautionary tale to solution Ormond, Cathal Ryan, Niamh M Corvin, Aiden Heron, Elizabeth A Brief Bioinform Problem Solving Protocol Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the most accurate position information, but this procedure is often computationally expensive, and therefore, tools such as liftOver and CrossMap are used to convert data from one build to another. However, the positions of converted SNVs do not always match SNVs derived from aligned data, and in some instances, SNVs are known to change chromosome when converted. This is a significant problem when compiling sequencing resources or comparing results across studies. Here, we describe a novel algorithm to identify positions that are unstable when converting between human genome reference builds. These positions are detected independent of the conversion tools and are determined by the chain files, which provide a mapping of contiguous positions from one build to another. We also provide the list of unstable positions for converting between the two most commonly used builds GRCh37 and GRCh38. Pre-excluding SNVs at these positions, prior to conversion, results in SNVs that are stable to conversion. This simple procedure gives the same final list of stable SNVs as applying the algorithm and subsequently removing variants at unstable positions. This work highlights the care that must be taken when converting SNVs between genome builds and provides a simple method for ensuring higher confidence converted data. Unstable positions and algorithm code, available at https://github.com/cathaloruaidh/genomeBuildConversion Oxford University Press 2021-04-05 /pmc/articles/PMC8425424/ /pubmed/33822888 http://dx.doi.org/10.1093/bib/bbab069 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Ormond, Cathal
Ryan, Niamh M
Corvin, Aiden
Heron, Elizabeth A
Converting single nucleotide variants between genome builds: from cautionary tale to solution
title Converting single nucleotide variants between genome builds: from cautionary tale to solution
title_full Converting single nucleotide variants between genome builds: from cautionary tale to solution
title_fullStr Converting single nucleotide variants between genome builds: from cautionary tale to solution
title_full_unstemmed Converting single nucleotide variants between genome builds: from cautionary tale to solution
title_short Converting single nucleotide variants between genome builds: from cautionary tale to solution
title_sort converting single nucleotide variants between genome builds: from cautionary tale to solution
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425424/
https://www.ncbi.nlm.nih.gov/pubmed/33822888
http://dx.doi.org/10.1093/bib/bbab069
work_keys_str_mv AT ormondcathal convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution
AT ryanniamhm convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution
AT corvinaiden convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution
AT heronelizabetha convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution