Cargando…
nanotatoR: a tool for enhanced annotation of genomic structural variants
BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome ass...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7789800/ https://www.ncbi.nlm.nih.gov/pubmed/33407088 http://dx.doi.org/10.1186/s12864-020-07182-w |
_version_ | 1783633319973355520 |
---|---|
author | Bhattacharya, Surajit Barseghyan, Hayk Délot, Emmanuèle C. Vilain, Eric |
author_facet | Bhattacharya, Surajit Barseghyan, Hayk Délot, Emmanuèle C. Vilain, Eric |
author_sort | Bhattacharya, Surajit |
collection | PubMed |
description | BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting. |
format | Online Article Text |
id | pubmed-7789800 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77898002021-01-11 nanotatoR: a tool for enhanced annotation of genomic structural variants Bhattacharya, Surajit Barseghyan, Hayk Délot, Emmanuèle C. Vilain, Eric BMC Genomics Software BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting. BioMed Central 2021-01-06 /pmc/articles/PMC7789800/ /pubmed/33407088 http://dx.doi.org/10.1186/s12864-020-07182-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Bhattacharya, Surajit Barseghyan, Hayk Délot, Emmanuèle C. Vilain, Eric nanotatoR: a tool for enhanced annotation of genomic structural variants |
title | nanotatoR: a tool for enhanced annotation of genomic structural variants |
title_full | nanotatoR: a tool for enhanced annotation of genomic structural variants |
title_fullStr | nanotatoR: a tool for enhanced annotation of genomic structural variants |
title_full_unstemmed | nanotatoR: a tool for enhanced annotation of genomic structural variants |
title_short | nanotatoR: a tool for enhanced annotation of genomic structural variants |
title_sort | nanotator: a tool for enhanced annotation of genomic structural variants |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7789800/ https://www.ncbi.nlm.nih.gov/pubmed/33407088 http://dx.doi.org/10.1186/s12864-020-07182-w |
work_keys_str_mv | AT bhattacharyasurajit nanotatoratoolforenhancedannotationofgenomicstructuralvariants AT barseghyanhayk nanotatoratoolforenhancedannotationofgenomicstructuralvariants AT delotemmanuelec nanotatoratoolforenhancedannotationofgenomicstructuralvariants AT vilaineric nanotatoratoolforenhancedannotationofgenomicstructuralvariants |