Cargando…

MAC: identifying and correcting annotation for multi-nucleotide variations

BACKGROUND: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolv...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Lei, Liu, Lu T., Conroy, Jacob R., Hu, Qiang, Conroy, Jeffrey M., Morrison, Carl D., Johnson, Candace S., Wang, Jianmin, Liu, Song
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521406/
https://www.ncbi.nlm.nih.gov/pubmed/26231518
http://dx.doi.org/10.1186/s12864-015-1779-7
_version_ 1782383809204846592
author Wei, Lei
Liu, Lu T.
Conroy, Jacob R.
Hu, Qiang
Conroy, Jeffrey M.
Morrison, Carl D.
Johnson, Candace S.
Wang, Jianmin
Liu, Song
author_facet Wei, Lei
Liu, Lu T.
Conroy, Jacob R.
Hu, Qiang
Conroy, Jeffrey M.
Morrison, Carl D.
Johnson, Candace S.
Wang, Jianmin
Liu, Song
author_sort Wei, Lei
collection PubMed
description BACKGROUND: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome. RESULTS: Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls. CONCLUSIONS: MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function.
format Online
Article
Text
id pubmed-4521406
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45214062015-08-01 MAC: identifying and correcting annotation for multi-nucleotide variations Wei, Lei Liu, Lu T. Conroy, Jacob R. Hu, Qiang Conroy, Jeffrey M. Morrison, Carl D. Johnson, Candace S. Wang, Jianmin Liu, Song BMC Genomics Software BACKGROUND: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome. RESULTS: Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls. CONCLUSIONS: MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function. BioMed Central 2015-08-01 /pmc/articles/PMC4521406/ /pubmed/26231518 http://dx.doi.org/10.1186/s12864-015-1779-7 Text en © Wei et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Wei, Lei
Liu, Lu T.
Conroy, Jacob R.
Hu, Qiang
Conroy, Jeffrey M.
Morrison, Carl D.
Johnson, Candace S.
Wang, Jianmin
Liu, Song
MAC: identifying and correcting annotation for multi-nucleotide variations
title MAC: identifying and correcting annotation for multi-nucleotide variations
title_full MAC: identifying and correcting annotation for multi-nucleotide variations
title_fullStr MAC: identifying and correcting annotation for multi-nucleotide variations
title_full_unstemmed MAC: identifying and correcting annotation for multi-nucleotide variations
title_short MAC: identifying and correcting annotation for multi-nucleotide variations
title_sort mac: identifying and correcting annotation for multi-nucleotide variations
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521406/
https://www.ncbi.nlm.nih.gov/pubmed/26231518
http://dx.doi.org/10.1186/s12864-015-1779-7
work_keys_str_mv AT weilei macidentifyingandcorrectingannotationformultinucleotidevariations
AT liulut macidentifyingandcorrectingannotationformultinucleotidevariations
AT conroyjacobr macidentifyingandcorrectingannotationformultinucleotidevariations
AT huqiang macidentifyingandcorrectingannotationformultinucleotidevariations
AT conroyjeffreym macidentifyingandcorrectingannotationformultinucleotidevariations
AT morrisoncarld macidentifyingandcorrectingannotationformultinucleotidevariations
AT johnsoncandaces macidentifyingandcorrectingannotationformultinucleotidevariations
AT wangjianmin macidentifyingandcorrectingannotationformultinucleotidevariations
AT liusong macidentifyingandcorrectingannotationformultinucleotidevariations