Cargando…

Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology

A wide variety of genome sequencing platforms have emerged in the recent past. High-throughput platforms like Illumina and 454 are essentially adaptations of the shotgun approach generating millions of fragmented single or paired sequencing reads. To reconstruct whole genomes, the reads have to be a...

Descripción completa

Detalles Bibliográficos
Autores principales: Shaik, Sabiha, Kumar, Narender, Lankapalli, Aditya K., Tiwari, Sumeet K., Baddam, Ramani, Ahmed, Niyaz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889084/
https://www.ncbi.nlm.nih.gov/pubmed/27248146
http://dx.doi.org/10.1371/journal.pone.0155459
_version_ 1782434943554551808
author Shaik, Sabiha
Kumar, Narender
Lankapalli, Aditya K.
Tiwari, Sumeet K.
Baddam, Ramani
Ahmed, Niyaz
author_facet Shaik, Sabiha
Kumar, Narender
Lankapalli, Aditya K.
Tiwari, Sumeet K.
Baddam, Ramani
Ahmed, Niyaz
author_sort Shaik, Sabiha
collection PubMed
description A wide variety of genome sequencing platforms have emerged in the recent past. High-throughput platforms like Illumina and 454 are essentially adaptations of the shotgun approach generating millions of fragmented single or paired sequencing reads. To reconstruct whole genomes, the reads have to be assembled into contigs, which often require further downstream processing. The contigs can be directly ordered according to a reference, scaffolded based on paired read information, or assembled using a combination of the two approaches. While the reference-based approach appears to mask strain-specific information, scaffolding based on paired-end information suffers when repetitive elements longer than the size of the sequencing reads are present in the genome. Sequencing technologies that produce long reads can solve the problems associated with repetitive elements but are not necessarily easily available to researchers. The most common high-throughput technology currently used is the Illumina short read platform. To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds.
format Online
Article
Text
id pubmed-4889084
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48890842016-06-10 Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology Shaik, Sabiha Kumar, Narender Lankapalli, Aditya K. Tiwari, Sumeet K. Baddam, Ramani Ahmed, Niyaz PLoS One Research Article A wide variety of genome sequencing platforms have emerged in the recent past. High-throughput platforms like Illumina and 454 are essentially adaptations of the shotgun approach generating millions of fragmented single or paired sequencing reads. To reconstruct whole genomes, the reads have to be assembled into contigs, which often require further downstream processing. The contigs can be directly ordered according to a reference, scaffolded based on paired read information, or assembled using a combination of the two approaches. While the reference-based approach appears to mask strain-specific information, scaffolding based on paired-end information suffers when repetitive elements longer than the size of the sequencing reads are present in the genome. Sequencing technologies that produce long reads can solve the problems associated with repetitive elements but are not necessarily easily available to researchers. The most common high-throughput technology currently used is the Illumina short read platform. To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds. Public Library of Science 2016-06-01 /pmc/articles/PMC4889084/ /pubmed/27248146 http://dx.doi.org/10.1371/journal.pone.0155459 Text en © 2016 Shaik et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shaik, Sabiha
Kumar, Narender
Lankapalli, Aditya K.
Tiwari, Sumeet K.
Baddam, Ramani
Ahmed, Niyaz
Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology
title Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology
title_full Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology
title_fullStr Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology
title_full_unstemmed Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology
title_short Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology
title_sort contig-layout-authenticator (cla): a combinatorial approach to ordering and scaffolding of bacterial contigs for comparative genomics and molecular epidemiology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889084/
https://www.ncbi.nlm.nih.gov/pubmed/27248146
http://dx.doi.org/10.1371/journal.pone.0155459
work_keys_str_mv AT shaiksabiha contiglayoutauthenticatorclaacombinatorialapproachtoorderingandscaffoldingofbacterialcontigsforcomparativegenomicsandmolecularepidemiology
AT kumarnarender contiglayoutauthenticatorclaacombinatorialapproachtoorderingandscaffoldingofbacterialcontigsforcomparativegenomicsandmolecularepidemiology
AT lankapalliadityak contiglayoutauthenticatorclaacombinatorialapproachtoorderingandscaffoldingofbacterialcontigsforcomparativegenomicsandmolecularepidemiology
AT tiwarisumeetk contiglayoutauthenticatorclaacombinatorialapproachtoorderingandscaffoldingofbacterialcontigsforcomparativegenomicsandmolecularepidemiology
AT baddamramani contiglayoutauthenticatorclaacombinatorialapproachtoorderingandscaffoldingofbacterialcontigsforcomparativegenomicsandmolecularepidemiology
AT ahmedniyaz contiglayoutauthenticatorclaacombinatorialapproachtoorderingandscaffoldingofbacterialcontigsforcomparativegenomicsandmolecularepidemiology