Cargando…

Database of Potential Promoter Sequences in the Capsicum annuum Genome

SIMPLE SUMMARY: In this study, we searched for potential promoter sequences (PPS) in the pepper genome. We used a new mathematical method for the multiple alignment of highly divergent sequences. Hence, 20 statistically significant classes of sequences in the range from −499 to +100 nucleotides near...

Descripción completa

Detalles Bibliográficos
Autores principales: Rudenko, Valentina, Korotkov, Eugene
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9332048/
https://www.ncbi.nlm.nih.gov/pubmed/35892972
http://dx.doi.org/10.3390/biology11081117
_version_ 1784758553360203776
author Rudenko, Valentina
Korotkov, Eugene
author_facet Rudenko, Valentina
Korotkov, Eugene
author_sort Rudenko, Valentina
collection PubMed
description SIMPLE SUMMARY: In this study, we searched for potential promoter sequences (PPS) in the pepper genome. We used a new mathematical method for the multiple alignment of highly divergent sequences. Hence, 20 statistically significant classes of sequences in the range from −499 to +100 nucleotides near the annotated genes were calculated. A profile was constructed for each class, which was then used as a position–weight matrix to build a two-dimensional alignment. We found 825,136 potential promoter sequences with a false positive rate of 0.13% in the pepper genome. They were subsequently merged into a database. Potential promoter sequences were analyzed by TSSFinder software, which detected transcription start sites in more than a half of our data. The results show that the pepper genome contains many PPSs. We assume that most of them could be associated with various transposons, dispersed repeats, or viruses. ABSTRACT: In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the Capsicum annuum genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from −499 to +100 nucleotides near the annotated genes were calculated. For each class, a position–weight matrix (PWM) was computed and then used to identify PPSs in the C. annuum genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology.
format Online
Article
Text
id pubmed-9332048
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93320482022-07-29 Database of Potential Promoter Sequences in the Capsicum annuum Genome Rudenko, Valentina Korotkov, Eugene Biology (Basel) Article SIMPLE SUMMARY: In this study, we searched for potential promoter sequences (PPS) in the pepper genome. We used a new mathematical method for the multiple alignment of highly divergent sequences. Hence, 20 statistically significant classes of sequences in the range from −499 to +100 nucleotides near the annotated genes were calculated. A profile was constructed for each class, which was then used as a position–weight matrix to build a two-dimensional alignment. We found 825,136 potential promoter sequences with a false positive rate of 0.13% in the pepper genome. They were subsequently merged into a database. Potential promoter sequences were analyzed by TSSFinder software, which detected transcription start sites in more than a half of our data. The results show that the pepper genome contains many PPSs. We assume that most of them could be associated with various transposons, dispersed repeats, or viruses. ABSTRACT: In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the Capsicum annuum genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from −499 to +100 nucleotides near the annotated genes were calculated. For each class, a position–weight matrix (PWM) was computed and then used to identify PPSs in the C. annuum genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology. MDPI 2022-07-26 /pmc/articles/PMC9332048/ /pubmed/35892972 http://dx.doi.org/10.3390/biology11081117 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Rudenko, Valentina
Korotkov, Eugene
Database of Potential Promoter Sequences in the Capsicum annuum Genome
title Database of Potential Promoter Sequences in the Capsicum annuum Genome
title_full Database of Potential Promoter Sequences in the Capsicum annuum Genome
title_fullStr Database of Potential Promoter Sequences in the Capsicum annuum Genome
title_full_unstemmed Database of Potential Promoter Sequences in the Capsicum annuum Genome
title_short Database of Potential Promoter Sequences in the Capsicum annuum Genome
title_sort database of potential promoter sequences in the capsicum annuum genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9332048/
https://www.ncbi.nlm.nih.gov/pubmed/35892972
http://dx.doi.org/10.3390/biology11081117
work_keys_str_mv AT rudenkovalentina databaseofpotentialpromotersequencesinthecapsicumannuumgenome
AT korotkoveugene databaseofpotentialpromotersequencesinthecapsicumannuumgenome