Cargando…
MAC: identifying and correcting annotation for multi-nucleotide variations
BACKGROUND: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolv...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521406/ https://www.ncbi.nlm.nih.gov/pubmed/26231518 http://dx.doi.org/10.1186/s12864-015-1779-7 |
_version_ | 1782383809204846592 |
---|---|
author | Wei, Lei Liu, Lu T. Conroy, Jacob R. Hu, Qiang Conroy, Jeffrey M. Morrison, Carl D. Johnson, Candace S. Wang, Jianmin Liu, Song |
author_facet | Wei, Lei Liu, Lu T. Conroy, Jacob R. Hu, Qiang Conroy, Jeffrey M. Morrison, Carl D. Johnson, Candace S. Wang, Jianmin Liu, Song |
author_sort | Wei, Lei |
collection | PubMed |
description | BACKGROUND: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome. RESULTS: Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls. CONCLUSIONS: MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function. |
format | Online Article Text |
id | pubmed-4521406 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45214062015-08-01 MAC: identifying and correcting annotation for multi-nucleotide variations Wei, Lei Liu, Lu T. Conroy, Jacob R. Hu, Qiang Conroy, Jeffrey M. Morrison, Carl D. Johnson, Candace S. Wang, Jianmin Liu, Song BMC Genomics Software BACKGROUND: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome. RESULTS: Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls. CONCLUSIONS: MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function. BioMed Central 2015-08-01 /pmc/articles/PMC4521406/ /pubmed/26231518 http://dx.doi.org/10.1186/s12864-015-1779-7 Text en © Wei et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Wei, Lei Liu, Lu T. Conroy, Jacob R. Hu, Qiang Conroy, Jeffrey M. Morrison, Carl D. Johnson, Candace S. Wang, Jianmin Liu, Song MAC: identifying and correcting annotation for multi-nucleotide variations |
title | MAC: identifying and correcting annotation for multi-nucleotide variations |
title_full | MAC: identifying and correcting annotation for multi-nucleotide variations |
title_fullStr | MAC: identifying and correcting annotation for multi-nucleotide variations |
title_full_unstemmed | MAC: identifying and correcting annotation for multi-nucleotide variations |
title_short | MAC: identifying and correcting annotation for multi-nucleotide variations |
title_sort | mac: identifying and correcting annotation for multi-nucleotide variations |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521406/ https://www.ncbi.nlm.nih.gov/pubmed/26231518 http://dx.doi.org/10.1186/s12864-015-1779-7 |
work_keys_str_mv | AT weilei macidentifyingandcorrectingannotationformultinucleotidevariations AT liulut macidentifyingandcorrectingannotationformultinucleotidevariations AT conroyjacobr macidentifyingandcorrectingannotationformultinucleotidevariations AT huqiang macidentifyingandcorrectingannotationformultinucleotidevariations AT conroyjeffreym macidentifyingandcorrectingannotationformultinucleotidevariations AT morrisoncarld macidentifyingandcorrectingannotationformultinucleotidevariations AT johnsoncandaces macidentifyingandcorrectingannotationformultinucleotidevariations AT wangjianmin macidentifyingandcorrectingannotationformultinucleotidevariations AT liusong macidentifyingandcorrectingannotationformultinucleotidevariations |