Cargando…

SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data

Currently, there are many publicly available Next Generation Sequencing tools developed for variant annotation and classification. However, as modern sequencing technology produces more and more sequencing data, a more efficient analysis program is desired, especially for variant analysis. In this s...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Chang, Ma, Kevin, Xu, Nicole, Fu, Chenjian, He, Andrew, Liu, Xiaoming, Bai, Yongsheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510352/
https://www.ncbi.nlm.nih.gov/pubmed/36171799
http://dx.doi.org/10.3389/frai.2022.991733
_version_ 1784797423436038144
author Li, Chang
Ma, Kevin
Xu, Nicole
Fu, Chenjian
He, Andrew
Liu, Xiaoming
Bai, Yongsheng
author_facet Li, Chang
Ma, Kevin
Xu, Nicole
Fu, Chenjian
He, Andrew
Liu, Xiaoming
Bai, Yongsheng
author_sort Li, Chang
collection PubMed
description Currently, there are many publicly available Next Generation Sequencing tools developed for variant annotation and classification. However, as modern sequencing technology produces more and more sequencing data, a more efficient analysis program is desired, especially for variant analysis. In this study, we updated SNPAAMapper, a variant annotation pipeline by converting perl codes to python for generating annotation output with an improved computational efficiency and updated information for broader applicability. The new pipeline written in Python can classify variants by region (Coding Sequence, Untranslated Regions, upstream, downstream, intron), predict amino acid change type (missense, nonsense, etc.), and prioritize mutation effects (e.g., synonymous > non-synonymous) while being faster and more efficient. Our new pipeline works in five steps. First, exon annotation files are generated. Next, the exon annotation files are processed, and gene mapping and feature information files are produced. Afterward, the python scrips classify the variants based on genomic regions and predict the amino acid change category. Lastly, another python script prioritizes and ranks the mutation effects of variants to output the result file. The Python version of SNPAAMapper accomplished the overall speed by running most annotation steps in a substantially shorter time. The Python script can classify variants by region in 53 s compared to 166 s for the Perl script in a test sample run on a Latitude 7480 Desktop computer with 8GB RAM and an Intel Core i5-6300 CPU @ 2.4Ghz. Steps of predicting amino acid change type and prioritizing mutation effects of variants were executed within 1 s for both pipelines. SNPAAMapper-Python was developed and tested on the ClinVar database, a NCBI database of information on genomic variation and its relationship to human health. We believe our developed Python version of SNPAAMapper variant annotation pipeline will benefit the community by elucidating the variant consequence and speed up the discovery of causative genetic variants through whole genome/exome sequencing. Source codes, test data files, instructions, and further explanations are available on the web at https://github.com/BaiLab/SNPAAMapper-Python.
format Online
Article
Text
id pubmed-9510352
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95103522022-09-27 SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data Li, Chang Ma, Kevin Xu, Nicole Fu, Chenjian He, Andrew Liu, Xiaoming Bai, Yongsheng Front Artif Intell Artificial Intelligence Currently, there are many publicly available Next Generation Sequencing tools developed for variant annotation and classification. However, as modern sequencing technology produces more and more sequencing data, a more efficient analysis program is desired, especially for variant analysis. In this study, we updated SNPAAMapper, a variant annotation pipeline by converting perl codes to python for generating annotation output with an improved computational efficiency and updated information for broader applicability. The new pipeline written in Python can classify variants by region (Coding Sequence, Untranslated Regions, upstream, downstream, intron), predict amino acid change type (missense, nonsense, etc.), and prioritize mutation effects (e.g., synonymous > non-synonymous) while being faster and more efficient. Our new pipeline works in five steps. First, exon annotation files are generated. Next, the exon annotation files are processed, and gene mapping and feature information files are produced. Afterward, the python scrips classify the variants based on genomic regions and predict the amino acid change category. Lastly, another python script prioritizes and ranks the mutation effects of variants to output the result file. The Python version of SNPAAMapper accomplished the overall speed by running most annotation steps in a substantially shorter time. The Python script can classify variants by region in 53 s compared to 166 s for the Perl script in a test sample run on a Latitude 7480 Desktop computer with 8GB RAM and an Intel Core i5-6300 CPU @ 2.4Ghz. Steps of predicting amino acid change type and prioritizing mutation effects of variants were executed within 1 s for both pipelines. SNPAAMapper-Python was developed and tested on the ClinVar database, a NCBI database of information on genomic variation and its relationship to human health. We believe our developed Python version of SNPAAMapper variant annotation pipeline will benefit the community by elucidating the variant consequence and speed up the discovery of causative genetic variants through whole genome/exome sequencing. Source codes, test data files, instructions, and further explanations are available on the web at https://github.com/BaiLab/SNPAAMapper-Python. Frontiers Media S.A. 2022-09-12 /pmc/articles/PMC9510352/ /pubmed/36171799 http://dx.doi.org/10.3389/frai.2022.991733 Text en Copyright © 2022 Li, Ma, Xu, Fu, He, Liu and Bai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Li, Chang
Ma, Kevin
Xu, Nicole
Fu, Chenjian
He, Andrew
Liu, Xiaoming
Bai, Yongsheng
SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
title SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
title_full SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
title_fullStr SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
title_full_unstemmed SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
title_short SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
title_sort snpaamapper-python: a highly efficient genome-wide snp variant analysis pipeline for next-generation sequencing data
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510352/
https://www.ncbi.nlm.nih.gov/pubmed/36171799
http://dx.doi.org/10.3389/frai.2022.991733
work_keys_str_mv AT lichang snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata
AT makevin snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata
AT xunicole snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata
AT fuchenjian snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata
AT heandrew snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata
AT liuxiaoming snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata
AT baiyongsheng snpaamapperpythonahighlyefficientgenomewidesnpvariantanalysispipelinefornextgenerationsequencingdata