Cargando…

An improved approach for reconstructing consensus repeats from short sequence reads

BACKGROUND: Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genom...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chu, Chong, Pei, Jingwen, Wu, Yufeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101065/ https://www.ncbi.nlm.nih.gov/pubmed/30367582 http://dx.doi.org/10.1186/s12864-018-4920-6

_version_	1783348980393967616
author	Chu, Chong Pei, Jingwen Wu, Yufeng
author_facet	Chu, Chong Pei, Jingwen Wu, Yufeng
author_sort	Chu, Chong
collection	PubMed
description	BACKGROUND: Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method. RESULTS: We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads. CONCLUSION: We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo.
format	Online Article Text
id	pubmed-6101065
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-61010652018-08-27 An improved approach for reconstructing consensus repeats from short sequence reads Chu, Chong Pei, Jingwen Wu, Yufeng BMC Genomics Research BACKGROUND: Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method. RESULTS: We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads. CONCLUSION: We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo. BioMed Central 2018-08-13 /pmc/articles/PMC6101065/ /pubmed/30367582 http://dx.doi.org/10.1186/s12864-018-4920-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Chu, Chong Pei, Jingwen Wu, Yufeng An improved approach for reconstructing consensus repeats from short sequence reads
title	An improved approach for reconstructing consensus repeats from short sequence reads
title_full	An improved approach for reconstructing consensus repeats from short sequence reads
title_fullStr	An improved approach for reconstructing consensus repeats from short sequence reads
title_full_unstemmed	An improved approach for reconstructing consensus repeats from short sequence reads
title_short	An improved approach for reconstructing consensus repeats from short sequence reads
title_sort	improved approach for reconstructing consensus repeats from short sequence reads
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101065/ https://www.ncbi.nlm.nih.gov/pubmed/30367582 http://dx.doi.org/10.1186/s12864-018-4920-6
work_keys_str_mv	AT chuchong animprovedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT peijingwen animprovedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT wuyufeng animprovedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT chuchong improvedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT peijingwen improvedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT wuyufeng improvedapproachforreconstructingconsensusrepeatsfromshortsequencereads

An improved approach for reconstructing consensus repeats from short sequence reads

Ejemplares similares