Cargando…

Simplifier: a web tool to eliminate redundant NGS contigs

Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algor...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramos, Rommel Thiago Jucá, Carneiro, Adriana Ribeiro, Azevedo, Vasco, Schneider, Maria Paula, Barh, Debmalya, Silva, Artur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524941/
https://www.ncbi.nlm.nih.gov/pubmed/23275695
http://dx.doi.org/10.6026/97320630008996
_version_ 1782253374869078016
author Ramos, Rommel Thiago Jucá
Carneiro, Adriana Ribeiro
Azevedo, Vasco
Schneider, Maria Paula
Barh, Debmalya
Silva, Artur
author_facet Ramos, Rommel Thiago Jucá
Carneiro, Adriana Ribeiro
Azevedo, Vasco
Schneider, Maria Paula
Barh, Debmalya
Silva, Artur
author_sort Ramos, Rommel Thiago Jucá
collection PubMed
description Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. AVAILABILITY: Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.
format Online
Article
Text
id pubmed-3524941
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-35249412012-12-28 Simplifier: a web tool to eliminate redundant NGS contigs Ramos, Rommel Thiago Jucá Carneiro, Adriana Ribeiro Azevedo, Vasco Schneider, Maria Paula Barh, Debmalya Silva, Artur Bioinformation Software Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. AVAILABILITY: Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher. Biomedical Informatics 2012-10-13 /pmc/articles/PMC3524941/ /pubmed/23275695 http://dx.doi.org/10.6026/97320630008996 Text en © 2012 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Software
Ramos, Rommel Thiago Jucá
Carneiro, Adriana Ribeiro
Azevedo, Vasco
Schneider, Maria Paula
Barh, Debmalya
Silva, Artur
Simplifier: a web tool to eliminate redundant NGS contigs
title Simplifier: a web tool to eliminate redundant NGS contigs
title_full Simplifier: a web tool to eliminate redundant NGS contigs
title_fullStr Simplifier: a web tool to eliminate redundant NGS contigs
title_full_unstemmed Simplifier: a web tool to eliminate redundant NGS contigs
title_short Simplifier: a web tool to eliminate redundant NGS contigs
title_sort simplifier: a web tool to eliminate redundant ngs contigs
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524941/
https://www.ncbi.nlm.nih.gov/pubmed/23275695
http://dx.doi.org/10.6026/97320630008996
work_keys_str_mv AT ramosrommelthiagojuca simplifierawebtooltoeliminateredundantngscontigs
AT carneiroadrianaribeiro simplifierawebtooltoeliminateredundantngscontigs
AT azevedovasco simplifierawebtooltoeliminateredundantngscontigs
AT schneidermariapaula simplifierawebtooltoeliminateredundantngscontigs
AT barhdebmalya simplifierawebtooltoeliminateredundantngscontigs
AT silvaartur simplifierawebtooltoeliminateredundantngscontigs