Cargando…
msRepDB: a comprehensive repetitive sequence database of over 80 000 species
Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causi...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728181/ https://www.ncbi.nlm.nih.gov/pubmed/34850956 http://dx.doi.org/10.1093/nar/gkab1089 |
_version_ | 1784626678454026240 |
---|---|
author | Liao, Xingyu Hu, Kang Salhi, Adil Zou, You Wang, Jianxin Gao, Xin |
author_facet | Liao, Xingyu Hu, Kang Salhi, Adil Zou, You Wang, Jianxin Gao, Xin |
author_sort | Liao, Xingyu |
collection | PubMed |
description | Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causing deletions, inversions, and translocations. Comprehensive identification, classification and annotation of repeats in genomes can provide accurate and targeted solutions towards understanding and diagnosis of complex diseases, optimization of plant properties and development of new drugs. RepBase and Dfam are two most frequently used repeat databases, but they are not sufficiently complete. Due to the lack of a comprehensive repeat database of multiple species, the current research in this field is far from being satisfactory. LongRepMarker is a new framework developed recently by our group for comprehensive identification of genomic repeats. We here propose msRepDB based on LongRepMarker, which is currently the most comprehensive multi-species repeat database, covering >80 000 species. Comprehensive evaluations show that msRepDB contains more species, and more complete repeats and families than RepBase and Dfam databases. (https://msrepdb.cbrc.kaust.edu.sa/pages/msRepDB/index.html). |
format | Online Article Text |
id | pubmed-8728181 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-87281812022-01-05 msRepDB: a comprehensive repetitive sequence database of over 80 000 species Liao, Xingyu Hu, Kang Salhi, Adil Zou, You Wang, Jianxin Gao, Xin Nucleic Acids Res Database Issue Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causing deletions, inversions, and translocations. Comprehensive identification, classification and annotation of repeats in genomes can provide accurate and targeted solutions towards understanding and diagnosis of complex diseases, optimization of plant properties and development of new drugs. RepBase and Dfam are two most frequently used repeat databases, but they are not sufficiently complete. Due to the lack of a comprehensive repeat database of multiple species, the current research in this field is far from being satisfactory. LongRepMarker is a new framework developed recently by our group for comprehensive identification of genomic repeats. We here propose msRepDB based on LongRepMarker, which is currently the most comprehensive multi-species repeat database, covering >80 000 species. Comprehensive evaluations show that msRepDB contains more species, and more complete repeats and families than RepBase and Dfam databases. (https://msrepdb.cbrc.kaust.edu.sa/pages/msRepDB/index.html). Oxford University Press 2021-12-01 /pmc/articles/PMC8728181/ /pubmed/34850956 http://dx.doi.org/10.1093/nar/gkab1089 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Database Issue Liao, Xingyu Hu, Kang Salhi, Adil Zou, You Wang, Jianxin Gao, Xin msRepDB: a comprehensive repetitive sequence database of over 80 000 species |
title | msRepDB: a comprehensive repetitive sequence database of over 80 000 species |
title_full | msRepDB: a comprehensive repetitive sequence database of over 80 000 species |
title_fullStr | msRepDB: a comprehensive repetitive sequence database of over 80 000 species |
title_full_unstemmed | msRepDB: a comprehensive repetitive sequence database of over 80 000 species |
title_short | msRepDB: a comprehensive repetitive sequence database of over 80 000 species |
title_sort | msrepdb: a comprehensive repetitive sequence database of over 80 000 species |
topic | Database Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728181/ https://www.ncbi.nlm.nih.gov/pubmed/34850956 http://dx.doi.org/10.1093/nar/gkab1089 |
work_keys_str_mv | AT liaoxingyu msrepdbacomprehensiverepetitivesequencedatabaseofover80000species AT hukang msrepdbacomprehensiverepetitivesequencedatabaseofover80000species AT salhiadil msrepdbacomprehensiverepetitivesequencedatabaseofover80000species AT zouyou msrepdbacomprehensiverepetitivesequencedatabaseofover80000species AT wangjianxin msrepdbacomprehensiverepetitivesequencedatabaseofover80000species AT gaoxin msrepdbacomprehensiverepetitivesequencedatabaseofover80000species |