Cargando…

Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231

Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many...

Descripción completa

Detalles Bibliográficos
Autores principales: Baptista, Rodrigo P., Reis-Cunha, Joao Luis, DeBarry, Jeremy D., Chiari, Egler, Kissinger, Jessica C., Bartholomeu, Daniella C., Macedo, Andrea M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5989580/
https://www.ncbi.nlm.nih.gov/pubmed/29442617
http://dx.doi.org/10.1099/mgen.0.000156
_version_ 1783329491311919104
author Baptista, Rodrigo P.
Reis-Cunha, Joao Luis
DeBarry, Jeremy D.
Chiari, Egler
Kissinger, Jessica C.
Bartholomeu, Daniella C.
Macedo, Andrea M.
author_facet Baptista, Rodrigo P.
Reis-Cunha, Joao Luis
DeBarry, Jeremy D.
Chiari, Egler
Kissinger, Jessica C.
Bartholomeu, Daniella C.
Macedo, Andrea M.
author_sort Baptista, Rodrigo P.
collection PubMed
description Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.
format Online
Article
Text
id pubmed-5989580
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-59895802018-06-12 Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231 Baptista, Rodrigo P. Reis-Cunha, Joao Luis DeBarry, Jeremy D. Chiari, Egler Kissinger, Jessica C. Bartholomeu, Daniella C. Macedo, Andrea M. Microb Genom Research Article Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution. Microbiology Society 2018-02-14 /pmc/articles/PMC5989580/ /pubmed/29442617 http://dx.doi.org/10.1099/mgen.0.000156 Text en http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Baptista, Rodrigo P.
Reis-Cunha, Joao Luis
DeBarry, Jeremy D.
Chiari, Egler
Kissinger, Jessica C.
Bartholomeu, Daniella C.
Macedo, Andrea M.
Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
title Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
title_full Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
title_fullStr Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
title_full_unstemmed Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
title_short Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
title_sort assembly of highly repetitive genomes using short reads: the genome of discrete typing unit iii trypanosoma cruzi strain 231
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5989580/
https://www.ncbi.nlm.nih.gov/pubmed/29442617
http://dx.doi.org/10.1099/mgen.0.000156
work_keys_str_mv AT baptistarodrigop assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231
AT reiscunhajoaoluis assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231
AT debarryjeremyd assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231
AT chiariegler assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231
AT kissingerjessicac assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231
AT bartholomeudaniellac assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231
AT macedoandream assemblyofhighlyrepetitivegenomesusingshortreadsthegenomeofdiscretetypingunitiiitrypanosomacruzistrain231