Cargando…

nanotatoR: a tool for enhanced annotation of genomic structural variants

BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharya, Surajit, Barseghyan, Hayk, Délot, Emmanuèle C., Vilain, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7789800/
https://www.ncbi.nlm.nih.gov/pubmed/33407088
http://dx.doi.org/10.1186/s12864-020-07182-w
_version_ 1783633319973355520
author Bhattacharya, Surajit
Barseghyan, Hayk
Délot, Emmanuèle C.
Vilain, Eric
author_facet Bhattacharya, Surajit
Barseghyan, Hayk
Délot, Emmanuèle C.
Vilain, Eric
author_sort Bhattacharya, Surajit
collection PubMed
description BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.
format Online
Article
Text
id pubmed-7789800
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77898002021-01-11 nanotatoR: a tool for enhanced annotation of genomic structural variants Bhattacharya, Surajit Barseghyan, Hayk Délot, Emmanuèle C. Vilain, Eric BMC Genomics Software BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting. BioMed Central 2021-01-06 /pmc/articles/PMC7789800/ /pubmed/33407088 http://dx.doi.org/10.1186/s12864-020-07182-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Bhattacharya, Surajit
Barseghyan, Hayk
Délot, Emmanuèle C.
Vilain, Eric
nanotatoR: a tool for enhanced annotation of genomic structural variants
title nanotatoR: a tool for enhanced annotation of genomic structural variants
title_full nanotatoR: a tool for enhanced annotation of genomic structural variants
title_fullStr nanotatoR: a tool for enhanced annotation of genomic structural variants
title_full_unstemmed nanotatoR: a tool for enhanced annotation of genomic structural variants
title_short nanotatoR: a tool for enhanced annotation of genomic structural variants
title_sort nanotator: a tool for enhanced annotation of genomic structural variants
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7789800/
https://www.ncbi.nlm.nih.gov/pubmed/33407088
http://dx.doi.org/10.1186/s12864-020-07182-w
work_keys_str_mv AT bhattacharyasurajit nanotatoratoolforenhancedannotationofgenomicstructuralvariants
AT barseghyanhayk nanotatoratoolforenhancedannotationofgenomicstructuralvariants
AT delotemmanuelec nanotatoratoolforenhancedannotationofgenomicstructuralvariants
AT vilaineric nanotatoratoolforenhancedannotationofgenomicstructuralvariants