Cargando…

msRepDB: a comprehensive repetitive sequence database of over 80 000 species

Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causi...

Descripción completa

Detalles Bibliográficos
Autores principales: Liao, Xingyu, Hu, Kang, Salhi, Adil, Zou, You, Wang, Jianxin, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728181/
https://www.ncbi.nlm.nih.gov/pubmed/34850956
http://dx.doi.org/10.1093/nar/gkab1089
_version_ 1784626678454026240
author Liao, Xingyu
Hu, Kang
Salhi, Adil
Zou, You
Wang, Jianxin
Gao, Xin
author_facet Liao, Xingyu
Hu, Kang
Salhi, Adil
Zou, You
Wang, Jianxin
Gao, Xin
author_sort Liao, Xingyu
collection PubMed
description Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causing deletions, inversions, and translocations. Comprehensive identification, classification and annotation of repeats in genomes can provide accurate and targeted solutions towards understanding and diagnosis of complex diseases, optimization of plant properties and development of new drugs. RepBase and Dfam are two most frequently used repeat databases, but they are not sufficiently complete. Due to the lack of a comprehensive repeat database of multiple species, the current research in this field is far from being satisfactory. LongRepMarker is a new framework developed recently by our group for comprehensive identification of genomic repeats. We here propose msRepDB based on LongRepMarker, which is currently the most comprehensive multi-species repeat database, covering >80 000 species. Comprehensive evaluations show that msRepDB contains more species, and more complete repeats and families than RepBase and Dfam databases. (https://msrepdb.cbrc.kaust.edu.sa/pages/msRepDB/index.html).
format Online
Article
Text
id pubmed-8728181
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87281812022-01-05 msRepDB: a comprehensive repetitive sequence database of over 80 000 species Liao, Xingyu Hu, Kang Salhi, Adil Zou, You Wang, Jianxin Gao, Xin Nucleic Acids Res Database Issue Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causing deletions, inversions, and translocations. Comprehensive identification, classification and annotation of repeats in genomes can provide accurate and targeted solutions towards understanding and diagnosis of complex diseases, optimization of plant properties and development of new drugs. RepBase and Dfam are two most frequently used repeat databases, but they are not sufficiently complete. Due to the lack of a comprehensive repeat database of multiple species, the current research in this field is far from being satisfactory. LongRepMarker is a new framework developed recently by our group for comprehensive identification of genomic repeats. We here propose msRepDB based on LongRepMarker, which is currently the most comprehensive multi-species repeat database, covering >80 000 species. Comprehensive evaluations show that msRepDB contains more species, and more complete repeats and families than RepBase and Dfam databases. (https://msrepdb.cbrc.kaust.edu.sa/pages/msRepDB/index.html). Oxford University Press 2021-12-01 /pmc/articles/PMC8728181/ /pubmed/34850956 http://dx.doi.org/10.1093/nar/gkab1089 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Database Issue
Liao, Xingyu
Hu, Kang
Salhi, Adil
Zou, You
Wang, Jianxin
Gao, Xin
msRepDB: a comprehensive repetitive sequence database of over 80 000 species
title msRepDB: a comprehensive repetitive sequence database of over 80 000 species
title_full msRepDB: a comprehensive repetitive sequence database of over 80 000 species
title_fullStr msRepDB: a comprehensive repetitive sequence database of over 80 000 species
title_full_unstemmed msRepDB: a comprehensive repetitive sequence database of over 80 000 species
title_short msRepDB: a comprehensive repetitive sequence database of over 80 000 species
title_sort msrepdb: a comprehensive repetitive sequence database of over 80 000 species
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728181/
https://www.ncbi.nlm.nih.gov/pubmed/34850956
http://dx.doi.org/10.1093/nar/gkab1089
work_keys_str_mv AT liaoxingyu msrepdbacomprehensiverepetitivesequencedatabaseofover80000species
AT hukang msrepdbacomprehensiverepetitivesequencedatabaseofover80000species
AT salhiadil msrepdbacomprehensiverepetitivesequencedatabaseofover80000species
AT zouyou msrepdbacomprehensiverepetitivesequencedatabaseofover80000species
AT wangjianxin msrepdbacomprehensiverepetitivesequencedatabaseofover80000species
AT gaoxin msrepdbacomprehensiverepetitivesequencedatabaseofover80000species