Cargando…

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Curren...

Descripción completa

Detalles Bibliográficos
Autores principales: Desai, Aarti, Marwah, Veer Singh, Yadav, Akshay, Jha, Vineet, Dhaygude, Kishor, Bangar, Ujwala, Kulkarni, Vivek, Jere, Abhay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625192/
https://www.ncbi.nlm.nih.gov/pubmed/23593174
http://dx.doi.org/10.1371/journal.pone.0060204
_version_ 1782266081915699200
author Desai, Aarti
Marwah, Veer Singh
Yadav, Akshay
Jha, Vineet
Dhaygude, Kishor
Bangar, Ujwala
Kulkarni, Vivek
Jere, Abhay
author_facet Desai, Aarti
Marwah, Veer Singh
Yadav, Akshay
Jha, Vineet
Dhaygude, Kishor
Bangar, Ujwala
Kulkarni, Vivek
Jere, Abhay
author_sort Desai, Aarti
collection PubMed
description Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
format Online
Article
Text
id pubmed-3625192
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36251922013-04-16 Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data Desai, Aarti Marwah, Veer Singh Yadav, Akshay Jha, Vineet Dhaygude, Kishor Bangar, Ujwala Kulkarni, Vivek Jere, Abhay PLoS One Research Article Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. Public Library of Science 2013-04-12 /pmc/articles/PMC3625192/ /pubmed/23593174 http://dx.doi.org/10.1371/journal.pone.0060204 Text en © 2013 Desai et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Desai, Aarti
Marwah, Veer Singh
Yadav, Akshay
Jha, Vineet
Dhaygude, Kishor
Bangar, Ujwala
Kulkarni, Vivek
Jere, Abhay
Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
title Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
title_full Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
title_fullStr Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
title_full_unstemmed Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
title_short Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
title_sort identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625192/
https://www.ncbi.nlm.nih.gov/pubmed/23593174
http://dx.doi.org/10.1371/journal.pone.0060204
work_keys_str_mv AT desaiaarti identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT marwahveersingh identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT yadavakshay identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT jhavineet identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT dhaygudekishor identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT bangarujwala identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT kulkarnivivek identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata
AT jereabhay identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata