Cargando…
Converting single nucleotide variants between genome builds: from cautionary tale to solution
Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the mos...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425424/ https://www.ncbi.nlm.nih.gov/pubmed/33822888 http://dx.doi.org/10.1093/bib/bbab069 |
_version_ | 1783749847275274240 |
---|---|
author | Ormond, Cathal Ryan, Niamh M Corvin, Aiden Heron, Elizabeth A |
author_facet | Ormond, Cathal Ryan, Niamh M Corvin, Aiden Heron, Elizabeth A |
author_sort | Ormond, Cathal |
collection | PubMed |
description | Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the most accurate position information, but this procedure is often computationally expensive, and therefore, tools such as liftOver and CrossMap are used to convert data from one build to another. However, the positions of converted SNVs do not always match SNVs derived from aligned data, and in some instances, SNVs are known to change chromosome when converted. This is a significant problem when compiling sequencing resources or comparing results across studies. Here, we describe a novel algorithm to identify positions that are unstable when converting between human genome reference builds. These positions are detected independent of the conversion tools and are determined by the chain files, which provide a mapping of contiguous positions from one build to another. We also provide the list of unstable positions for converting between the two most commonly used builds GRCh37 and GRCh38. Pre-excluding SNVs at these positions, prior to conversion, results in SNVs that are stable to conversion. This simple procedure gives the same final list of stable SNVs as applying the algorithm and subsequently removing variants at unstable positions. This work highlights the care that must be taken when converting SNVs between genome builds and provides a simple method for ensuring higher confidence converted data. Unstable positions and algorithm code, available at https://github.com/cathaloruaidh/genomeBuildConversion |
format | Online Article Text |
id | pubmed-8425424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84254242021-09-09 Converting single nucleotide variants between genome builds: from cautionary tale to solution Ormond, Cathal Ryan, Niamh M Corvin, Aiden Heron, Elizabeth A Brief Bioinform Problem Solving Protocol Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the most accurate position information, but this procedure is often computationally expensive, and therefore, tools such as liftOver and CrossMap are used to convert data from one build to another. However, the positions of converted SNVs do not always match SNVs derived from aligned data, and in some instances, SNVs are known to change chromosome when converted. This is a significant problem when compiling sequencing resources or comparing results across studies. Here, we describe a novel algorithm to identify positions that are unstable when converting between human genome reference builds. These positions are detected independent of the conversion tools and are determined by the chain files, which provide a mapping of contiguous positions from one build to another. We also provide the list of unstable positions for converting between the two most commonly used builds GRCh37 and GRCh38. Pre-excluding SNVs at these positions, prior to conversion, results in SNVs that are stable to conversion. This simple procedure gives the same final list of stable SNVs as applying the algorithm and subsequently removing variants at unstable positions. This work highlights the care that must be taken when converting SNVs between genome builds and provides a simple method for ensuring higher confidence converted data. Unstable positions and algorithm code, available at https://github.com/cathaloruaidh/genomeBuildConversion Oxford University Press 2021-04-05 /pmc/articles/PMC8425424/ /pubmed/33822888 http://dx.doi.org/10.1093/bib/bbab069 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Ormond, Cathal Ryan, Niamh M Corvin, Aiden Heron, Elizabeth A Converting single nucleotide variants between genome builds: from cautionary tale to solution |
title | Converting single nucleotide variants between genome builds: from cautionary tale to solution |
title_full | Converting single nucleotide variants between genome builds: from cautionary tale to solution |
title_fullStr | Converting single nucleotide variants between genome builds: from cautionary tale to solution |
title_full_unstemmed | Converting single nucleotide variants between genome builds: from cautionary tale to solution |
title_short | Converting single nucleotide variants between genome builds: from cautionary tale to solution |
title_sort | converting single nucleotide variants between genome builds: from cautionary tale to solution |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425424/ https://www.ncbi.nlm.nih.gov/pubmed/33822888 http://dx.doi.org/10.1093/bib/bbab069 |
work_keys_str_mv | AT ormondcathal convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution AT ryanniamhm convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution AT corvinaiden convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution AT heronelizabetha convertingsinglenucleotidevariantsbetweengenomebuildsfromcautionarytaletosolution |