Cargando…

Database of Potential Promoter Sequences in the Capsicum annuum Genome

SIMPLE SUMMARY: In this study, we searched for potential promoter sequences (PPS) in the pepper genome. We used a new mathematical method for the multiple alignment of highly divergent sequences. Hence, 20 statistically significant classes of sequences in the range from −499 to +100 nucleotides near...

Descripción completa

Detalles Bibliográficos
Autores principales: Rudenko, Valentina, Korotkov, Eugene
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9332048/
https://www.ncbi.nlm.nih.gov/pubmed/35892972
http://dx.doi.org/10.3390/biology11081117
Descripción
Sumario:SIMPLE SUMMARY: In this study, we searched for potential promoter sequences (PPS) in the pepper genome. We used a new mathematical method for the multiple alignment of highly divergent sequences. Hence, 20 statistically significant classes of sequences in the range from −499 to +100 nucleotides near the annotated genes were calculated. A profile was constructed for each class, which was then used as a position–weight matrix to build a two-dimensional alignment. We found 825,136 potential promoter sequences with a false positive rate of 0.13% in the pepper genome. They were subsequently merged into a database. Potential promoter sequences were analyzed by TSSFinder software, which detected transcription start sites in more than a half of our data. The results show that the pepper genome contains many PPSs. We assume that most of them could be associated with various transposons, dispersed repeats, or viruses. ABSTRACT: In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the Capsicum annuum genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from −499 to +100 nucleotides near the annotated genes were calculated. For each class, a position–weight matrix (PWM) was computed and then used to identify PPSs in the C. annuum genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology.